Method

2025

Multimodal VAD: Visual Anomaly Detection in Intelligent Monitoring System via Audio-Vision-Language

20 June 2025·8913 words·42 mins

Dicong Wang , Qilong Wang , Qinghua Hu , Kaijun Wu

The paper proposes a dual-stream multimodal video anomaly detection network that leverages video, audio, and text modalities to achieve reliable and precise anomaly detection. It introduces effective multimodal fusion, abnormal-aware context prompts (ACPs), and a coarse-support-fine strategy to enhance anomaly discrimination and description, demonstrating superior performance on large-scale datasets.

PLOVAD: Prompting Vision-Language Models for Open Vocabulary Video Anomaly Detection

10 January 2025·10371 words·49 mins

Chenting Xu , Ke Xu , Xinghao Jiang , Tanfeng Sun

Ucf-Crime Shanghaitech Xd-Violence Ubnormal Weakly Supervised Instruction Tuning Unsupervised Hybrid Method

A novel framework (PLOVAD) leveraging prompt tuning on large-scale pretrained image-based vision-language models for open vocabulary video anomaly detection, incorporating domain-specific and anomaly-specific prompts, and a temporal module to detect and categorize both seen and unseen anomalies with limited parameters.

Ex-VAD: Explainable Fine-grained Video Anomaly Detection Based on Visual-Language Models

1 January 2025·6657 words·32 mins

Chao Huang , Yushu Shi , Jie Wen , Wei Wang , Yong Xu , Xiaochun Cao

Ucf-Crime Xd-Violence Hybrid Method

The paper introduces Ex-VAD, a comprehensive framework for fine-grained and explainable video anomaly detection that leverages visual-language models (VLMs) and large language models (LLMs). It features modules for generating anomaly explanations, fusing multimodal features for coarse detection, and expanding/aligning labels for fine-grained classification, with improved interpretability and accuracy demonstrated on UCF-Crime and XD-Violence datasets.

2024

VLAVAD: Vision-Language Models Assisted Unsupervised Video Anomaly Detection

1 January 2024·6374 words·30 mins

Changkang Li , Yalong Jiang

Shanghaitech Unsupervised Instruction Tuning Hybrid Method

Proposes VLAVAD, an unsupervised video anomaly detection method leveraging vision-language pre-trained models, utilizing semantic features, Selective Prompt Adapter, and Sequence State Space Module to improve interpretability and transferability, achieving state-of-the-art performance on the ShanghaiTech dataset.

VadCLIP: Adapting Vision-Language Models for Weakly Supervised Video Anomaly Detection

1 January 2024·6817 words·33 mins

Peng Wu , Xuerong Zhou , Guansong Pang , Lingru Zhou , Qingsen Yan , Peng Wang , Yanning Zhang

Xd-Violence Ucf-Crime Hybrid Method

A novel paradigm for weakly supervised video anomaly detection leveraging frozen CLIP model with dual-branch architecture, temporal modeling modules, and prompt mechanisms to utilize vision-language knowledge for both coarse- and fine-grained detection tasks, achieving state-of-the-art performance on benchmarks.

CALLM: Cascading Autoencoder and Large Language Model for Video Anomaly Detection

1 January 2024·3578 words·17 mins

Apostolos Ntelopoulos , Kamal Nasrollahi

Cuhk-Avenue Shanghaitech Ucf-Crime Ubnormal Weakly Supervised Method

This paper introduces a novel cascade system combining a 3D Autoencoder with a Large Visual Language Model (LVLM) for video anomaly detection, leveraging weak supervision and multimodal capabilities to improve detection and explanation of abnormalities.

2023

AnyAnomaly: Zero-Shot Customizable Video Anomaly Detection with LVLM

1 December 2023·8129 words·39 mins

Sunghyun Ahn , Youngwan Jo , Kijung Lee , Sein Kwon , Inpyo Hong , Sanghyun Park

Ubnormal Hybrid Method

Proposes the AnyAnomaly model utilizing large vision language models (LVLMs) for zero-shot, customizable video anomaly detection that detects user-defined anomalies without additional training, incorporating segment-level processing and context-aware visual question answering (VQA). The approach enhances generalization across diverse environments and achieves state-of-the-art results on benchmark datasets, demonstrating practical potential for real-world applications.

VERA: Explainable Video Anomaly Detection via Verbalized Learning of Vision-Language Models

1 October 2023·8183 words·39 mins

Muchao Ye , Weiyang Liu , Pan He

Ucf-Crime Xd-Violence Hybrid Method

Introduces VERA, a framework that enables frozen vision-language models to perform explainable video anomaly detection by learning detailed anomaly-characterization questions from coarsely labeled data, without model parameter modifications. The method decomposes complex reasoning into reflections on guiding questions, optimizes them via verbal interactions, and guides VLMs to generate segment- and frame-level anomaly scores with improved explainability and performance.

VAU-R1: Advancing Video Anomaly Understanding via Reinforcement Fine-Tuning

1 October 2023·9520 words·45 mins

Liyun Zhu , Qixiang Chen , Xi Shen , Xiaodong Cun

Ucf-Crime Shanghaitech Other Hybrid Method

Introduces VAU-R1, a reinforcement fine-tuning framework leveraging Group Relative Policy Optimization (GRPO) to enhance multimodal large language models’ (MLLMs) reasoning capabilities in video anomaly understanding (VAU). Develops VAUBench, a comprehensive Chain-of-Thought benchmark with rich annotations across perception, grounding, reasoning, and classification tasks, supported by multiple evaluation metrics including VAU-Eval, QA accuracy, temporal IoU, and Factual Consistency. Demonstrates significant improvements over supervised fine-tuning in question answering accuracy, temporal localization, and interpretability, thereby establishing a scalable, interpretable, and reasoning-aware VAU framework.

VADSK: VIDEO ANOMALY DETECTION WITH STRUCTURED KEYWORDS

1 October 2023·6806 words·32 mins

Thomas Foltz

Ucsd-Ped Shanghaitech Cuhk-Avenue Semi Supervised Instruction Tuning Method

A lightweight, interpretable, two-stage video anomaly detection pipeline employing foundational models for frame description generation and keyword-based classification, achieving comparable performance to state-of-the-art methods with real-time inference and enhanced interpretability.

Vad-R1: Towards Video Anomaly Reasoning via Perception-to-Cognition Chain-of-Thought

1 October 2023·11169 words·53 mins

Chao Huang , Benfeng Wang , Jie Wen , Chengliang Liu , Wei Wang , Li Shen , Xiaochun Cao

Shanghaitech Xd-Violence Ubnormal Ucf-Crime Ucsd-Ped Other Hybrid Method

Proposes a structured Perception-to-Cognition Chain-of-Thought and introduces Vad-Reasoning dataset, along with an improved reinforcement learning algorithm AVA-GRPO, to enhance the deep reasoning capabilities of Multimodal Large Language Models in video anomaly detection and understanding tasks.

Text Prompt with Normality Guidance for Weakly Supervised Video Anomaly Detection

1 October 2023·8709 words·41 mins

Zhiwei Yang , Jing Liu , Peng Wu

Ucf-Crime Xd-Violence Hybrid Method

Proposes a novel pseudo-label generation and self-training framework incorporating CLIP for text-image alignment, learnable text prompts, normality visual prompts, a pseudo-label generation module guided by normality clues, and a self-adaptive temporal dependence learning module, achieving state-of-the-art performance on benchmark datasets.

SUVAD: Semantic Understanding Based Video Anomaly Detection Using MLLM

1 October 2023·4313 words·21 mins

Shibo Gao , Peipei Yang , Linlin Huang

Ucf-Crime Xd-Violence Shanghaitech Ucsd-Ped Other Semi Supervised Training Free Method

Proposes a training-free video anomaly detection method leveraging multi-modal large language models for semantic understanding of videos, enabling scene generalization, interpretability, and flexible anomaly definition without retraining.

Holmes-VAD: Towards Unbiased and Explainable Video Anomaly Detection via Multi-modal LLM

1 October 2023·8242 words·39 mins

Huaxin Zhang , Xiaohao Xu , Xiang Wang , Jialong Zuo , Chuchu Han , Xiaonan Huang , Changxin Gao , Yuehuan Wang , Nong Sang

Shanghaitech Ucf-Crime Xd-Violence Hybrid Method

A novel framework leveraging multimodal instructions and large-scale datasets to enable unbiased, interpretable, and accurate video anomaly detection with large language models, including a new dataset VAD-Instruct50k with single-frame annotations and explanatory instruction data.

Follow the Rules: Reasoning for Video Anomaly Detection with Large Language Models

1 October 2023·13548 words·64 mins

Yuchen Yang , Kwonjoon Lee , Behzad Dariush , Yinzhi Cao , Shao-Yuan Lo

Shanghaitech Ucf-Crime Ucsd-Ped Other Hybrid Method

Proposes a rule-based reasoning framework, AnomalyRuler, for video anomaly detection using large language models, enabling fast scenario adaptation with few-normal-shot prompting and enhanced robustness through strategic modules.

Cross-Domain Learning for Video Anomaly Detection with Limited Supervision

1 October 2023·9120 words·43 mins

Yashika Jain , Ali Dabouei , Min Xu

Ucf-Crime Xd-Violence Weakly Supervised Hybrid Method

A proposed weakly-supervised framework that incorporates external unlabeled data during training by estimating prediction bias and adaptively minimizing it using predicted uncertainty, to enhance cross-domain generalization in video anomaly detection.

CLIP: Assisted Video Anomaly Detection

1 October 2023·6463 words·31 mins

Meng Dong

Ucf-Crime Shanghaitech Hybrid Method

Proposes a generalized framework for video anomaly detection based on CLIP, introducing generative anomaly descriptions, temporal modules for capturing temporal correlations, and object-centric approaches to improve performance and robustness, with extensive experimentation on UCF-Crime and ShanghaiTech datasets.

AVadCLIP: Audio-Visual Collaboration for Robust Video Anomaly Detection

1 October 2023·8733 words·41 mins

Peng Wu , Wanshun Su , Guansong Pang , Yujia Sun , Qingsen Yan , Peng Wang , Yanning Zhang

Xd-Violence Ucf-Crime Shanghaitech Weakly Supervised Hybrid Method

A novel weakly supervised framework leveraging audio-visual collaboration to improve the robustness and accuracy of video anomaly detection.

Anomize: Better Open Vocabulary Video Anomaly Detection

1 October 2023·6692 words·32 mins

Fei Li , Wenxuan Liu , Jingjing Chen , Ruixu Zhang , Yuran Wang , Xian Zhong , Zheng Wang

Ucf-Crime Xd-Violence Hybrid Method

The paper introduces the Anomize framework that addresses detection ambiguity and categorization confusion in open vocabulary video anomaly detection (OVVAD) by leveraging visual and textual data augmentation, dual-stream mechanisms, and label relation guidance, achieving superior performance on multiple datasets.

An Attribute-based Method for Video Anomaly Detection

1 October 2023·9752 words·46 mins

Tal Reiss , Yedid Hoshen

Shanghaitech Ucf-Crime Weakly Supervised Semi Supervised Method

A simple attribute-based approach that represents each object by velocity and pose attributes, combining these with deep representations, and uses density estimation for anomaly scoring, achieving state-of-the-art performance.

↑