Other

2025

Sherlock: Towards Multi-scene Video Abnormal Event Extraction and Localization via a Global-local Spatial-sensitive LLM

28 April 2025·8779 words·42 mins

Junxiao Ma , Jingjing Wang , Peiying Yu , Jiamin Luo , Guodong Zhou

Shanghaitech Ucf-Crime Other Semi Supervised Other

Proposes a new task (M-VAE) for structured extraction and localization of abnormal events in videos, introduces Sherlock model with a Global-local Spatial-sensitive MoE module and a Spatial Imbalance Regulator, and demonstrates its effectiveness through extensive experiments.

AADC-Net: A Multimodal Deep Learning Framework for Automatic Anomaly Detection in Real-Time Surveillance

31 March 2025·10163 words·48 mins

Duc Tri Phan , Vu Hoang Minh Doan , Jaeyeop Choi , Byeongil Lee , Junghwan Oh

Ucf-Crime Xd-Violence Hybrid Other

Introduces AADC-Net, a multimodal deep neural network leveraging pretrained vision-language models, large language models, and object detection (DETR) for real-time anomaly detection and categorization in surveillance videos. The framework addresses data scarcity, imbalance, and computational challenges, demonstrating state-of-the-art performance on multiple datasets, with practical deployment in smart gyms and healthcare settings.

Personalizing Vision-Language Models With Hybrid Prompts for Zero-Shot Anomaly Detection

13 February 2025·8885 words·42 mins

Yunkang Cao , Xiaohao Xu , Yuqi Cheng , Chen Sun , Zongwei Du , Liang Gao , Weiming Shen

Cuhk-Avenue Shanghaitech Xd-Violence Ubnormal Ucf-Crime Ucsd-Ped Other Weakly Supervised Semi Supervised Training Free Instruction Tuning Unsupervised Hybrid Other

Introduces AnomalyVLM, a framework leveraging hybrid prompts derived from prior knowledge to enhance zero-shot anomaly detection by personalizing vision-language models, incorporating an anomaly region generator and refiner, and utilizing hybrid prompts for category-specific customization and improved detection performance.

2024

Text-Driven Traffic Anomaly Detection With Temporal High-Frequency Modeling in Driving Videos

17 April 2024·10204 words·48 mins

Rongqin Liang , Yuanman Li , Jiantao Zhou , Xia Li

Cuhk-Avenue Shanghaitech Xd-Violence Ubnormal Ucf-Crime Ucsd-Ped Other Hybrid Other

The paper introduces TTHF, a novel single-stage method aligning video clips with text prompts for traffic anomaly detection. It emphasizes modeling high frequency in the temporal domain to capture dynamic changes in driving scenes, and proposes an attentive anomaly focusing mechanism to enhance detection of various traffic anomalies. The approach leverages visual-text semantic alignment, modeling temporal high frequency, and guided attention mechanisms, achieving superior performance on benchmark datasets.

2023

Video Anomaly Detection and Explanation via Large Language Models

1 October 2023·6711 words·32 mins

Hui Lv , Qianru Sun

Ucf-Crime Other Semi Supervised Other

The paper introduces VAD-LLaMA, a novel framework integrating video-based large language models (VLLMs) for threshold-free, explainable video anomaly detection, featuring a Long-Term Context (LTC) module and a three-phase training process that enhances long-range context modeling and minimizes data annotation costs.

Towards Zero-Shot Anomaly Detection and Reasoning with Multimodal Large Language Models

1 October 2023·11640 words·55 mins

Jiacong Xu , Shao-Yuan Lo , Bardia Safaei , Vishal M. Patel , Isht Dwivedi

Hybrid Other

Introduces a specialist visual assistant, Anomaly-OV, leveraging an anomaly expert and visual token selection mechanism to improve zero-shot anomaly detection and reasoning, establishing new datasets and benchmarks in the domain.

Simplifying Traffic Anomaly Detection with Video Foundation Models

1 October 2023·7027 words·33 mins

Tommie Kerssies , Gijs Dubbelman

Other Semi Supervised Other

The paper investigates the use of simple encoder-only Video Vision Transformers (Video ViTs) with various pre-training strategies for traffic anomaly detection (TAD), demonstrating that with strong pretraining and domain adaptation, minimal architectural complexity can outperform complex prior methods, highlighting the importance of pretraining strategies like Masked Video Modeling (MVM).

Open-Vocabulary Video Anomaly Detection

1 October 2023·7786 words·37 mins

Peng Wu , Xuerong Zhou , Guansong Pang , Yujia Sun , Jing Liu , Peng Wang , Yanning Zhang

Ucf-Crime Xd-Violence Ubnormal Hybrid Other

This paper explores open-vocabulary video anomaly detection (OVVAD) leveraging pre-trained large models to detect and categorize seen and unseen anomalies. It proposes a disentangled approach with class-agnostic detection and class-specific classification modules, enhanced by semantic knowledge injection, anomaly synthesis, and joint optimization, to achieve state-of-the-art performance.

Learning to Understand Open-World Video Anomalies

1 October 2023·11409 words·54 mins

Jiaqi Tang , Hao Lu , Ruizheng Wu , Xiaogang Xu , Ke Ma , Cheng Fang , Bin Guo , Jiangbo Lu , Qifeng Chen , Ying-Cong Chen

Shanghaitech Cuhk-Avenue Xd-Violence Ubnormal Ucf-Crime Ucsd-Ped Other Hybrid Other

Introduces HAWK, a novel framework leveraging interactive large Visual Language Models with explicit and implicit motion modality integration, auxiliary consistency loss, and detailed language annotations for diverse video anomaly scenarios. Demonstrates state-of-the-art performance in video description and question-answering tasks across multiple open-world datasets, with extensive annotated data and generation pipelines to enhance practical anomaly understanding and interaction capabilities.

Learning Suspected Anomalies from Event Prompts for Video Anomaly Detection

1 October 2023·7307 words·35 mins

Chenchen Tao , Xiaohao Peng , Chong Wang , Jiafei Wu , Puning Zhao , Jun Wang , Jiangbo Qian

Xd-Violence Ucf-Crime Shanghaitech Semi Supervised Other

Proposes a novel framework named LAP that leverages textual event prompts and semantic similarity for weakly supervised video anomaly detection. It introduces a multi-prompt learning process, pseudo anomaly labeling, and integrates semantic features derived from a prompt dictionary to guide the detection model, resulting in improved performance across multiple datasets.

Holmes-VAU: Towards Long-term Video Anomaly Understanding at Any Granularity

1 October 2023·11025 words·52 mins

Huaxin Zhang , Xiaohao Xu , Xiang Wang , Jialong Zuo , Xiaonan Huang , Changxin Gao , Li Yu , Shanjun Zhang , Nong Sang

Ucf-Crime Other Hybrid Other

A semi-automated hierarchical video annotation framework combined with a novel Anomaly-focused Temporal Sampler and a multimodal large language model, aimed at comprehensive understanding of complex and long-term video anomalies across multiple temporal scales.

Harnessing Large Language Models for Training-free Video Anomaly Detection

1 October 2023·6913 words·33 mins

Luca Zanella , Willi Menapace , Massimiliano Mancini , Yiming Wang , Elisa Ricci

Ucf-Crime Xd-Violence Hybrid Other

Introduces a training-free method for video anomaly detection (VAD) leveraging pre-trained large language models (LLMs) and vision-language models (VLMs). Proposes techniques for caption cleaning, scene description, and anomaly scoring without additional training, demonstrating superior performance on surveillance datasets.

Exploring Large Vision-Language Models for Robust and Efficient Industrial Anomaly Detection

1 October 2023·4850 words·23 mins

Kun Qian , Tianyu Sun , Wenhong Wang

Other Hybrid Other

Proposes a novel approach (CLAD) leveraging large vision-language models with contrastive cross-modal training for improved industrial anomaly detection and localization, enhancing interpretability and robustness.

Anomaly-Led Prompting Learning Caption Generating Model and Benchmark

1 October 2023·12528 words·59 mins

Qianyue Bao , Fang Liu , Licheng Jiao , Yang Liu , Shuo Li , Lingling Li , Xu Liu , Xinyi Wang , Baoliang Chen

Other Hybrid Other

Introduces a new task for comprehensive video anomaly captioning, proposes a large-scale benchmark dataset CVACBench with fine-grained annotations, and designs a baseline model AGPFormer using prompt learning to improve anomaly understanding and description accuracy.

Aligning Effective Tokens with Video Anomaly in Large Language Models

1 October 2023·8317 words·40 mins

Yingxian Chen , Jiahui Liu , Ruidi Fan , Yanwei Li , Chirui Chang , Shizhen Zhao , Wilton W.T.Fok , Xiaojuan Qi , Yik-Chung Wu

Xd-Violence Hybrid Other

Proposes VA-GPT, a multimodal Large Language Model for video anomaly detection and understanding, utilizing effective token selection and generation modules (SETS and TETG) to improve spatial and temporal localization of anomalies. Introduces instruct-following fine-tuning data and cross-domain benchmarks for robustness evaluation.

Towards Video Anomaly Retrieval from Video Anomaly Detection: New Benchmarks and Model

1 January 2023·10197 words·48 mins

Peng Wu , Jing Liu , Xiangteng He , Yuxin Peng , Peng Wang , Yanning Zhang

Ucf-Crime Shanghaitech Hybrid Other

Proposes a new task called Video Anomaly Retrieval (VAR), introduces two large-scale benchmarks (UCFCrime-AR and XDViolence-AR), and presents a model called Anomaly-Led Alignment Network (ALAN) for VAR, focusing on retrieving long untrimmed videos using cross-modal queries such as language descriptions and synchronous audios. The work introduces anomaly-led sampling, a pretext task (VPMPM), and cross-modal alignment strategies to address the challenges of VAR in practical scenarios.

Text-Driven Traffic Anomaly Detection with Temporal High-Frequency Modeling in Driving Videos

1 January 2023·10710 words·51 mins

Rongqin Liang , Yuanman Li , Jiantao Zhou , Xia Li

Cuhk-Avenue Shanghaitech Hybrid Other

Introduces a novel single-stage approach (TTHF) for traffic anomaly detection that aligns video clips with text prompts and models high-frequency temporal changes, enhanced by an attention focusing mechanism, outperforming state-of-the-art methods on benchmark datasets.

Hierarchical Semantic Contrast for Scene-aware Video Anomaly Detection

1 January 2023·7920 words·38 mins

Shengyang Sun , Xiaojin Gong

Ucsd-Ped Shanghaitech Other Semi Supervised Other

The paper proposes a hierarchical semantic contrast (HSC) method that leverages scene-aware autoencoders, semantic contrastive learning, and motion augmentation for improved scene-dependent and scene-independent video anomaly detection. It incorporates pre-trained video parsing models, hierarchical contrastive learning at scene and object levels, and skeleton-based motion augmentation to make the normal feature representations more compact and discriminative, thereby enhancing anomaly detection performance.

Delving into CLIP latent space for Video Anomaly Recognition

1 January 2023·11434 words·54 mins

Luca Zanella , Benedetta Liberatori , Willi Menapace , Fabio Poiesi , Yiming Wang , Elisa Riccia

Shanghaitech Ucf-Crime Xd-Violence Semi Supervised Other

Proposes AnomalyCLIP, a novel method leveraging Large Language and Vision (LLV) models like CLIP, combined with multiple instance learning and a re-centring transformation of the CLIP feature space, to detect and classify video anomalies and recognize anomaly types. Introduces a Selector model with prompt learning and a Temporal Transformer-based model for temporal dependency modeling; demonstrates state-of-the-art performance on multiple benchmarks.

↑