Shanghaitech

2025

Sherlock: Towards Multi-scene Video Abnormal Event Extraction and Localization via a Global-local Spatial-sensitive LLM

28 April 2025·8779 words·42 mins

Junxiao Ma , Jingjing Wang , Peiying Yu , Jiamin Luo , Guodong Zhou

Shanghaitech Ucf-Crime Other Semi Supervised Other

Proposes a new task (M-VAE) for structured extraction and localization of abnormal events in videos, introduces Sherlock model with a Global-local Spatial-sensitive MoE module and a Spatial Imbalance Regulator, and demonstrates its effectiveness through extensive experiments.

Networking Systems for Video Anomaly Detection: A Tutorial and Survey

1 April 2025·21983 words·104 mins

Jing Liu , Yang Liu , Jieyu Lin , Jielin Li , Liang Cao , Peng Sun , Bo Hu , Liang Song , Azzedine Boukerche , Victor C.M. Leung

Cuhk-Avenue Shanghaitech Xd-Violence Ubnormal Ucf-Crime Ucsd-Ped Hybrid Survey

A comprehensive survey and tutorial exploring the assumptions, frameworks, recent advances, applications, and future trends of Networking Systems for Video Anomaly Detection (NSVAD), emphasizing the integration of AI, IoVT, and computing for real-world deployable systems.

Personalizing Vision-Language Models With Hybrid Prompts for Zero-Shot Anomaly Detection

13 February 2025·8885 words·42 mins

Yunkang Cao , Xiaohao Xu , Yuqi Cheng , Chen Sun , Zongwei Du , Liang Gao , Weiming Shen

Cuhk-Avenue Shanghaitech Xd-Violence Ubnormal Ucf-Crime Ucsd-Ped Other Weakly Supervised Semi Supervised Training Free Instruction Tuning Unsupervised Hybrid Other

Introduces AnomalyVLM, a framework leveraging hybrid prompts derived from prior knowledge to enhance zero-shot anomaly detection by personalizing vision-language models, incorporating an anomaly region generator and refiner, and utilizing hybrid prompts for category-specific customization and improved detection performance.

PLOVAD: Prompting Vision-Language Models for Open Vocabulary Video Anomaly Detection

10 January 2025·10371 words·49 mins

Chenting Xu , Ke Xu , Xinghao Jiang , Tanfeng Sun

Ucf-Crime Shanghaitech Xd-Violence Ubnormal Weakly Supervised Instruction Tuning Unsupervised Hybrid Method

A novel framework (PLOVAD) leveraging prompt tuning on large-scale pretrained image-based vision-language models for open vocabulary video anomaly detection, incorporating domain-specific and anomaly-specific prompts, and a temporal module to detect and categorize both seen and unseen anomalies with limited parameters.

2024

Text-Driven Traffic Anomaly Detection With Temporal High-Frequency Modeling in Driving Videos

17 April 2024·10204 words·48 mins

Rongqin Liang , Yuanman Li , Jiantao Zhou , Xia Li

Cuhk-Avenue Shanghaitech Xd-Violence Ubnormal Ucf-Crime Ucsd-Ped Other Hybrid Other

The paper introduces TTHF, a novel single-stage method aligning video clips with text prompts for traffic anomaly detection. It emphasizes modeling high frequency in the temporal domain to capture dynamic changes in driving scenes, and proposes an attentive anomaly focusing mechanism to enhance detection of various traffic anomalies. The approach leverages visual-text semantic alignment, modeling temporal high frequency, and guided attention mechanisms, achieving superior performance on benchmark datasets.

VLAVAD: Vision-Language Models Assisted Unsupervised Video Anomaly Detection

1 January 2024·6374 words·30 mins

Changkang Li , Yalong Jiang

Shanghaitech Unsupervised Instruction Tuning Hybrid Method

Proposes VLAVAD, an unsupervised video anomaly detection method leveraging vision-language pre-trained models, utilizing semantic features, Selective Prompt Adapter, and Sequence State Space Module to improve interpretability and transferability, achieving state-of-the-art performance on the ShanghaiTech dataset.

CALLM: Cascading Autoencoder and Large Language Model for Video Anomaly Detection

1 January 2024·3578 words·17 mins

Apostolos Ntelopoulos , Kamal Nasrollahi

Cuhk-Avenue Shanghaitech Ucf-Crime Ubnormal Weakly Supervised Method

This paper introduces a novel cascade system combining a 3D Autoencoder with a Large Visual Language Model (LVLM) for video anomaly detection, leveraging weak supervision and multimodal capabilities to improve detection and explanation of abnormalities.

2023

Video Anomaly Detection in 10 Years: A Survey and Outlook

1 October 2023·18854 words·89 mins

MOSHIRA ABDALLA , SAJID JAVED , MUAZ AL RADI , ANWAAR ULHAQ , NAOUFEL WERGHI

Shanghaitech Xd-Violence Ucf-Crime Ucsd-Ped Other Hybrid Survey

A comprehensive survey exploring deep learning-based video anomaly detection, including emerging paradigms such as weakly supervised, self-supervised, and unsupervised approaches, with a focus on core challenges, feature extraction, supervision schemes, loss functions, regularization techniques, and the potential of vision-language models (VLMs) for enhanced anomaly detection.

VAU-R1: Advancing Video Anomaly Understanding via Reinforcement Fine-Tuning

1 October 2023·9520 words·45 mins

Liyun Zhu , Qixiang Chen , Xi Shen , Xiaodong Cun

Ucf-Crime Shanghaitech Other Hybrid Method

Introduces VAU-R1, a reinforcement fine-tuning framework leveraging Group Relative Policy Optimization (GRPO) to enhance multimodal large language models’ (MLLMs) reasoning capabilities in video anomaly understanding (VAU). Develops VAUBench, a comprehensive Chain-of-Thought benchmark with rich annotations across perception, grounding, reasoning, and classification tasks, supported by multiple evaluation metrics including VAU-Eval, QA accuracy, temporal IoU, and Factual Consistency. Demonstrates significant improvements over supervised fine-tuning in question answering accuracy, temporal localization, and interpretability, thereby establishing a scalable, interpretable, and reasoning-aware VAU framework.

VADSK: VIDEO ANOMALY DETECTION WITH STRUCTURED KEYWORDS

1 October 2023·6806 words·32 mins

Thomas Foltz

Ucsd-Ped Shanghaitech Cuhk-Avenue Semi Supervised Instruction Tuning Method

A lightweight, interpretable, two-stage video anomaly detection pipeline employing foundational models for frame description generation and keyword-based classification, achieving comparable performance to state-of-the-art methods with real-time inference and enhanced interpretability.

Vad-R1: Towards Video Anomaly Reasoning via Perception-to-Cognition Chain-of-Thought

1 October 2023·11169 words·53 mins

Chao Huang , Benfeng Wang , Jie Wen , Chengliang Liu , Wei Wang , Li Shen , Xiaochun Cao

Shanghaitech Xd-Violence Ubnormal Ucf-Crime Ucsd-Ped Other Hybrid Method

Proposes a structured Perception-to-Cognition Chain-of-Thought and introduces Vad-Reasoning dataset, along with an improved reinforcement learning algorithm AVA-GRPO, to enhance the deep reasoning capabilities of Multimodal Large Language Models in video anomaly detection and understanding tasks.

SUVAD: Semantic Understanding Based Video Anomaly Detection Using MLLM

1 October 2023·4313 words·21 mins

Shibo Gao , Peipei Yang , Linlin Huang

Ucf-Crime Xd-Violence Shanghaitech Ucsd-Ped Other Semi Supervised Training Free Method

Proposes a training-free video anomaly detection method leveraging multi-modal large language models for semantic understanding of videos, enabling scene generalization, interpretability, and flexible anomaly definition without retraining.

Learning to Understand Open-World Video Anomalies

1 October 2023·11409 words·54 mins

Jiaqi Tang , Hao Lu , Ruizheng Wu , Xiaogang Xu , Ke Ma , Cheng Fang , Bin Guo , Jiangbo Lu , Qifeng Chen , Ying-Cong Chen

Shanghaitech Cuhk-Avenue Xd-Violence Ubnormal Ucf-Crime Ucsd-Ped Other Hybrid Other

Introduces HAWK, a novel framework leveraging interactive large Visual Language Models with explicit and implicit motion modality integration, auxiliary consistency loss, and detailed language annotations for diverse video anomaly scenarios. Demonstrates state-of-the-art performance in video description and question-answering tasks across multiple open-world datasets, with extensive annotated data and generation pipelines to enhance practical anomaly understanding and interaction capabilities.

Learning Suspected Anomalies from Event Prompts for Video Anomaly Detection

1 October 2023·7307 words·35 mins

Chenchen Tao , Xiaohao Peng , Chong Wang , Jiafei Wu , Puning Zhao , Jun Wang , Jiangbo Qian

Xd-Violence Ucf-Crime Shanghaitech Semi Supervised Other

Proposes a novel framework named LAP that leverages textual event prompts and semantic similarity for weakly supervised video anomaly detection. It introduces a multi-prompt learning process, pseudo anomaly labeling, and integrates semantic features derived from a prompt dictionary to guide the detection model, resulting in improved performance across multiple datasets.

Holmes-VAD: Towards Unbiased and Explainable Video Anomaly Detection via Multi-modal LLM

1 October 2023·8242 words·39 mins

Huaxin Zhang , Xiaohao Xu , Xiang Wang , Jialong Zuo , Chuchu Han , Xiaonan Huang , Changxin Gao , Yuehuan Wang , Nong Sang

Shanghaitech Ucf-Crime Xd-Violence Hybrid Method

A novel framework leveraging multimodal instructions and large-scale datasets to enable unbiased, interpretable, and accurate video anomaly detection with large language models, including a new dataset VAD-Instruct50k with single-frame annotations and explanatory instruction data.

Follow the Rules: Reasoning for Video Anomaly Detection with Large Language Models

1 October 2023·13548 words·64 mins

Yuchen Yang , Kwonjoon Lee , Behzad Dariush , Yinzhi Cao , Shao-Yuan Lo

Shanghaitech Ucf-Crime Ucsd-Ped Other Hybrid Method

Proposes a rule-based reasoning framework, AnomalyRuler, for video anomaly detection using large language models, enabling fast scenario adaptation with few-normal-shot prompting and enhanced robustness through strategic modules.

CLIP: Assisted Video Anomaly Detection

1 October 2023·6463 words·31 mins

Meng Dong

Ucf-Crime Shanghaitech Hybrid Method

Proposes a generalized framework for video anomaly detection based on CLIP, introducing generative anomaly descriptions, temporal modules for capturing temporal correlations, and object-centric approaches to improve performance and robustness, with extensive experimentation on UCF-Crime and ShanghaiTech datasets.

AVadCLIP: Audio-Visual Collaboration for Robust Video Anomaly Detection

1 October 2023·8733 words·41 mins

Peng Wu , Wanshun Su , Guansong Pang , Yujia Sun , Qingsen Yan , Peng Wang , Yanning Zhang

Xd-Violence Ucf-Crime Shanghaitech Weakly Supervised Hybrid Method

A novel weakly supervised framework leveraging audio-visual collaboration to improve the robustness and accuracy of video anomaly detection.

An Attribute-based Method for Video Anomaly Detection

1 October 2023·9752 words·46 mins

Tal Reiss , Yedid Hoshen

Shanghaitech Ucf-Crime Weakly Supervised Semi Supervised Method

A simple attribute-based approach that represents each object by velocity and pose attributes, combining these with deep representations, and uses density estimation for anomaly scoring, achieving state-of-the-art performance.

Action Hints: Semantic Typicality and Context Uniqueness for Generalizable Skeleton-based Video Anomaly Detection

1 October 2023·7886 words·38 mins

Canhui Tang , Sanping Zhou , Haoyue Shi , Le Wang

Shanghaitech Ubnormal Ucf-Crime Semi Supervised Instruction Tuning Method

Proposes a zero-shot skeleton-based video anomaly detection framework utilizing action semantic typicality and context uniqueness learning, involving a language-guided typicality modeling module and a test-time context uniqueness analysis module, achieving state-of-the-art results without target domain training data.

↑