Papers

2023

Vad-R1: Towards Video Anomaly Reasoning via Perception-to-Cognition Chain-of-Thought

1 October 2023·11169 words·53 mins

Chao Huang , Benfeng Wang , Jie Wen , Chengliang Liu , Wei Wang , Li Shen , Xiaochun Cao

Shanghaitech Xd-Violence Ubnormal Ucf-Crime Ucsd-Ped Other Hybrid Method

Proposes a structured Perception-to-Cognition Chain-of-Thought and introduces Vad-Reasoning dataset, along with an improved reinforcement learning algorithm AVA-GRPO, to enhance the deep reasoning capabilities of Multimodal Large Language Models in video anomaly detection and understanding tasks.

Unspecified

1 October 2023

Hybrid Survey

Unspecified

Towards Zero-Shot Anomaly Detection and Reasoning with Multimodal Large Language Models

1 October 2023·11640 words·55 mins

Jiacong Xu , Shao-Yuan Lo , Bardia Safaei , Vishal M. Patel , Isht Dwivedi

Hybrid Other

Introduces a specialist visual assistant, Anomaly-OV, leveraging an anomaly expert and visual token selection mechanism to improve zero-shot anomaly detection and reasoning, establishing new datasets and benchmarks in the domain.

Text Prompt with Normality Guidance for Weakly Supervised Video Anomaly Detection

1 October 2023·8709 words·41 mins

Zhiwei Yang , Jing Liu , Peng Wu

Ucf-Crime Xd-Violence Hybrid Method

Proposes a novel pseudo-label generation and self-training framework incorporating CLIP for text-image alignment, learnable text prompts, normality visual prompts, a pseudo-label generation module guided by normality clues, and a self-adaptive temporal dependence learning module, achieving state-of-the-art performance on benchmark datasets.

SUVAD: Semantic Understanding Based Video Anomaly Detection Using MLLM

1 October 2023·4313 words·21 mins

Shibo Gao , Peipei Yang , Linlin Huang

Ucf-Crime Xd-Violence Shanghaitech Ucsd-Ped Other Semi Supervised Training Free Method

Proposes a training-free video anomaly detection method leveraging multi-modal large language models for semantic understanding of videos, enabling scene generalization, interpretability, and flexible anomaly definition without retraining.

Simplifying Traffic Anomaly Detection with Video Foundation Models

1 October 2023·7027 words·33 mins

Tommie Kerssies , Gijs Dubbelman

Other Semi Supervised Other

The paper investigates the use of simple encoder-only Video Vision Transformers (Video ViTs) with various pre-training strategies for traffic anomaly detection (TAD), demonstrating that with strong pretraining and domain adaptation, minimal architectural complexity can outperform complex prior methods, highlighting the importance of pretraining strategies like Masked Video Modeling (MVM).

Open-Vocabulary Video Anomaly Detection

1 October 2023·7786 words·37 mins

Peng Wu , Xuerong Zhou , Guansong Pang , Yujia Sun , Jing Liu , Peng Wang , Yanning Zhang

Ucf-Crime Xd-Violence Ubnormal Hybrid Other

This paper explores open-vocabulary video anomaly detection (OVVAD) leveraging pre-trained large models to detect and categorize seen and unseen anomalies. It proposes a disentangled approach with class-agnostic detection and class-specific classification modules, enhanced by semantic knowledge injection, anomaly synthesis, and joint optimization, to achieve state-of-the-art performance.

Learning to Understand Open-World Video Anomalies

1 October 2023·11409 words·54 mins

Jiaqi Tang , Hao Lu , Ruizheng Wu , Xiaogang Xu , Ke Ma , Cheng Fang , Bin Guo , Jiangbo Lu , Qifeng Chen , Ying-Cong Chen

Shanghaitech Cuhk-Avenue Xd-Violence Ubnormal Ucf-Crime Ucsd-Ped Other Hybrid Other

Introduces HAWK, a novel framework leveraging interactive large Visual Language Models with explicit and implicit motion modality integration, auxiliary consistency loss, and detailed language annotations for diverse video anomaly scenarios. Demonstrates state-of-the-art performance in video description and question-answering tasks across multiple open-world datasets, with extensive annotated data and generation pipelines to enhance practical anomaly understanding and interaction capabilities.

Learning Suspected Anomalies from Event Prompts for Video Anomaly Detection

1 October 2023·7307 words·35 mins

Chenchen Tao , Xiaohao Peng , Chong Wang , Jiafei Wu , Puning Zhao , Jun Wang , Jiangbo Qian

Xd-Violence Ucf-Crime Shanghaitech Semi Supervised Other

Proposes a novel framework named LAP that leverages textual event prompts and semantic similarity for weakly supervised video anomaly detection. It introduces a multi-prompt learning process, pseudo anomaly labeling, and integrates semantic features derived from a prompt dictionary to guide the detection model, resulting in improved performance across multiple datasets.

Language-guided Open-world Video Anomaly Detection

1 October 2023·6686 words·32 mins

Zihao Liu , Xiaoyu Wu , Jianqin Wu , Xuxu Wang , Linlin Yang

Ucf-Crime Xd-Violence Ubnormal Ucsd-Ped Other Semi Supervised Unsupervised Hybrid Application

Proposes a novel open-world VAD paradigm guided by natural language, with a dynamic anomaly definition, regularization strategies, and a large-scale dataset (PreVAD) with multi-level annotations and descriptions. Achieves state-of-the-art zero-shot performance on seven datasets.

Holmes-VAU: Towards Long-term Video Anomaly Understanding at Any Granularity

1 October 2023·11025 words·52 mins

Huaxin Zhang , Xiaohao Xu , Xiang Wang , Jialong Zuo , Xiaonan Huang , Changxin Gao , Li Yu , Shanjun Zhang , Nong Sang

Ucf-Crime Other Hybrid Other

A semi-automated hierarchical video annotation framework combined with a novel Anomaly-focused Temporal Sampler and a multimodal large language model, aimed at comprehensive understanding of complex and long-term video anomalies across multiple temporal scales.

Holmes-VAD: Towards Unbiased and Explainable Video Anomaly Detection via Multi-modal LLM

1 October 2023·8242 words·39 mins

Huaxin Zhang , Xiaohao Xu , Xiang Wang , Jialong Zuo , Chuchu Han , Xiaonan Huang , Changxin Gao , Yuehuan Wang , Nong Sang

Shanghaitech Ucf-Crime Xd-Violence Hybrid Method

A novel framework leveraging multimodal instructions and large-scale datasets to enable unbiased, interpretable, and accurate video anomaly detection with large language models, including a new dataset VAD-Instruct50k with single-frame annotations and explanatory instruction data.

Harnessing Large Language Models for Training-free Video Anomaly Detection

1 October 2023·6913 words·33 mins

Luca Zanella , Willi Menapace , Massimiliano Mancini , Yiming Wang , Elisa Ricci

Ucf-Crime Xd-Violence Hybrid Other

Introduces a training-free method for video anomaly detection (VAD) leveraging pre-trained large language models (LLMs) and vision-language models (VLMs). Proposes techniques for caption cleaning, scene description, and anomaly scoring without additional training, demonstrating superior performance on surveillance datasets.

Follow the Rules: Reasoning for Video Anomaly Detection with Large Language Models

1 October 2023·13548 words·64 mins

Yuchen Yang , Kwonjoon Lee , Behzad Dariush , Yinzhi Cao , Shao-Yuan Lo

Shanghaitech Ucf-Crime Ucsd-Ped Other Hybrid Method

Proposes a rule-based reasoning framework, AnomalyRuler, for video anomaly detection using large language models, enabling fast scenario adaptation with few-normal-shot prompting and enhanced robustness through strategic modules.

Exploring Large Vision-Language Models for Robust and Efficient Industrial Anomaly Detection

1 October 2023·4850 words·23 mins

Kun Qian , Tianyu Sun , Wenhong Wang

Other Hybrid Other

Proposes a novel approach (CLAD) leveraging large vision-language models with contrastive cross-modal training for improved industrial anomaly detection and localization, enhancing interpretability and robustness.

Cross-Domain Learning for Video Anomaly Detection with Limited Supervision

1 October 2023·9120 words·43 mins

Yashika Jain , Ali Dabouei , Min Xu

Ucf-Crime Xd-Violence Weakly Supervised Hybrid Method

A proposed weakly-supervised framework that incorporates external unlabeled data during training by estimating prediction bias and adaptively minimizing it using predicted uncertainty, to enhance cross-domain generalization in video anomaly detection.

CLIP: Assisted Video Anomaly Detection

1 October 2023·6463 words·31 mins

Meng Dong

Ucf-Crime Shanghaitech Hybrid Method

Proposes a generalized framework for video anomaly detection based on CLIP, introducing generative anomaly descriptions, temporal modules for capturing temporal correlations, and object-centric approaches to improve performance and robustness, with extensive experimentation on UCF-Crime and ShanghaiTech datasets.

AVadCLIP: Audio-Visual Collaboration for Robust Video Anomaly Detection

1 October 2023·8733 words·41 mins

Peng Wu , Wanshun Su , Guansong Pang , Yujia Sun , Qingsen Yan , Peng Wang , Yanning Zhang

Xd-Violence Ucf-Crime Shanghaitech Weakly Supervised Hybrid Method

A novel weakly supervised framework leveraging audio-visual collaboration to improve the robustness and accuracy of video anomaly detection.

AssistPDA: An Online Video Surveillance Assistant for Video Anomaly Prediction, Detection, and Analysis

1 October 2023·7812 words·37 mins

Zhiwei Yang , Chen Gao , Jing Liu , Peng Wu , Guansong Pang , Mike Zheng Shou

Other Hybrid Application

Introducing AssistPDA, a pioneering framework for real-time online video anomaly prediction, detection, and analysis leveraging vision-language models with a novel spatiotemporal relation distillation module and constructed benchmark dataset VAPDA-127K.

Anomize: Better Open Vocabulary Video Anomaly Detection

1 October 2023·6692 words·32 mins

Fei Li , Wenxuan Liu , Jingjing Chen , Ruixu Zhang , Yuran Wang , Xian Zhong , Zheng Wang

Ucf-Crime Xd-Violence Hybrid Method

The paper introduces the Anomize framework that addresses detection ambiguity and categorization confusion in open vocabulary video anomaly detection (OVVAD) by leveraging visual and textual data augmentation, dual-stream mechanisms, and label relation guidance, achieving superior performance on multiple datasets.

↑