Hybrid

2023

Open-Vocabulary Video Anomaly Detection

1 October 2023·7786 words·37 mins

Peng Wu , Xuerong Zhou , Guansong Pang , Yujia Sun , Jing Liu , Peng Wang , Yanning Zhang

Ucf-Crime Xd-Violence Ubnormal Hybrid Other

This paper explores open-vocabulary video anomaly detection (OVVAD) leveraging pre-trained large models to detect and categorize seen and unseen anomalies. It proposes a disentangled approach with class-agnostic detection and class-specific classification modules, enhanced by semantic knowledge injection, anomaly synthesis, and joint optimization, to achieve state-of-the-art performance.

Learning to Understand Open-World Video Anomalies

1 October 2023·11409 words·54 mins

Jiaqi Tang , Hao Lu , Ruizheng Wu , Xiaogang Xu , Ke Ma , Cheng Fang , Bin Guo , Jiangbo Lu , Qifeng Chen , Ying-Cong Chen

Shanghaitech Cuhk-Avenue Xd-Violence Ubnormal Ucf-Crime Ucsd-Ped Other Hybrid Other

Introduces HAWK, a novel framework leveraging interactive large Visual Language Models with explicit and implicit motion modality integration, auxiliary consistency loss, and detailed language annotations for diverse video anomaly scenarios. Demonstrates state-of-the-art performance in video description and question-answering tasks across multiple open-world datasets, with extensive annotated data and generation pipelines to enhance practical anomaly understanding and interaction capabilities.

Language-guided Open-world Video Anomaly Detection

1 October 2023·6686 words·32 mins

Zihao Liu , Xiaoyu Wu , Jianqin Wu , Xuxu Wang , Linlin Yang

Ucf-Crime Xd-Violence Ubnormal Ucsd-Ped Other Semi Supervised Unsupervised Hybrid Application

Proposes a novel open-world VAD paradigm guided by natural language, with a dynamic anomaly definition, regularization strategies, and a large-scale dataset (PreVAD) with multi-level annotations and descriptions. Achieves state-of-the-art zero-shot performance on seven datasets.

Holmes-VAU: Towards Long-term Video Anomaly Understanding at Any Granularity

1 October 2023·11025 words·52 mins

Huaxin Zhang , Xiaohao Xu , Xiang Wang , Jialong Zuo , Xiaonan Huang , Changxin Gao , Li Yu , Shanjun Zhang , Nong Sang

Ucf-Crime Other Hybrid Other

A semi-automated hierarchical video annotation framework combined with a novel Anomaly-focused Temporal Sampler and a multimodal large language model, aimed at comprehensive understanding of complex and long-term video anomalies across multiple temporal scales.

Holmes-VAD: Towards Unbiased and Explainable Video Anomaly Detection via Multi-modal LLM

1 October 2023·8242 words·39 mins

Huaxin Zhang , Xiaohao Xu , Xiang Wang , Jialong Zuo , Chuchu Han , Xiaonan Huang , Changxin Gao , Yuehuan Wang , Nong Sang

Shanghaitech Ucf-Crime Xd-Violence Hybrid Method

A novel framework leveraging multimodal instructions and large-scale datasets to enable unbiased, interpretable, and accurate video anomaly detection with large language models, including a new dataset VAD-Instruct50k with single-frame annotations and explanatory instruction data.

Harnessing Large Language Models for Training-free Video Anomaly Detection

1 October 2023·6913 words·33 mins

Luca Zanella , Willi Menapace , Massimiliano Mancini , Yiming Wang , Elisa Ricci

Ucf-Crime Xd-Violence Hybrid Other

Introduces a training-free method for video anomaly detection (VAD) leveraging pre-trained large language models (LLMs) and vision-language models (VLMs). Proposes techniques for caption cleaning, scene description, and anomaly scoring without additional training, demonstrating superior performance on surveillance datasets.

Follow the Rules: Reasoning for Video Anomaly Detection with Large Language Models

1 October 2023·13548 words·64 mins

Yuchen Yang , Kwonjoon Lee , Behzad Dariush , Yinzhi Cao , Shao-Yuan Lo

Shanghaitech Ucf-Crime Ucsd-Ped Other Hybrid Method

Proposes a rule-based reasoning framework, AnomalyRuler, for video anomaly detection using large language models, enabling fast scenario adaptation with few-normal-shot prompting and enhanced robustness through strategic modules.

Exploring Large Vision-Language Models for Robust and Efficient Industrial Anomaly Detection

1 October 2023·4850 words·23 mins

Kun Qian , Tianyu Sun , Wenhong Wang

Other Hybrid Other

Proposes a novel approach (CLAD) leveraging large vision-language models with contrastive cross-modal training for improved industrial anomaly detection and localization, enhancing interpretability and robustness.

Cross-Domain Learning for Video Anomaly Detection with Limited Supervision

1 October 2023·9120 words·43 mins

Yashika Jain , Ali Dabouei , Min Xu

Ucf-Crime Xd-Violence Weakly Supervised Hybrid Method

A proposed weakly-supervised framework that incorporates external unlabeled data during training by estimating prediction bias and adaptively minimizing it using predicted uncertainty, to enhance cross-domain generalization in video anomaly detection.

CLIP: Assisted Video Anomaly Detection

1 October 2023·6463 words·31 mins

Meng Dong

Ucf-Crime Shanghaitech Hybrid Method

Proposes a generalized framework for video anomaly detection based on CLIP, introducing generative anomaly descriptions, temporal modules for capturing temporal correlations, and object-centric approaches to improve performance and robustness, with extensive experimentation on UCF-Crime and ShanghaiTech datasets.

AVadCLIP: Audio-Visual Collaboration for Robust Video Anomaly Detection

1 October 2023·8733 words·41 mins

Peng Wu , Wanshun Su , Guansong Pang , Yujia Sun , Qingsen Yan , Peng Wang , Yanning Zhang

Xd-Violence Ucf-Crime Shanghaitech Weakly Supervised Hybrid Method

A novel weakly supervised framework leveraging audio-visual collaboration to improve the robustness and accuracy of video anomaly detection.

AssistPDA: An Online Video Surveillance Assistant for Video Anomaly Prediction, Detection, and Analysis

1 October 2023·7812 words·37 mins

Zhiwei Yang , Chen Gao , Jing Liu , Peng Wu , Guansong Pang , Mike Zheng Shou

Other Hybrid Application

Introducing AssistPDA, a pioneering framework for real-time online video anomaly prediction, detection, and analysis leveraging vision-language models with a novel spatiotemporal relation distillation module and constructed benchmark dataset VAPDA-127K.

Anomize: Better Open Vocabulary Video Anomaly Detection

1 October 2023·6692 words·32 mins

Fei Li , Wenxuan Liu , Jingjing Chen , Ruixu Zhang , Yuran Wang , Xian Zhong , Zheng Wang

Ucf-Crime Xd-Violence Hybrid Method

The paper introduces the Anomize framework that addresses detection ambiguity and categorization confusion in open vocabulary video anomaly detection (OVVAD) by leveraging visual and textual data augmentation, dual-stream mechanisms, and label relation guidance, achieving superior performance on multiple datasets.

Anomaly-Led Prompting Learning Caption Generating Model and Benchmark

1 October 2023·12528 words·59 mins

Qianyue Bao , Fang Liu , Licheng Jiao , Yang Liu , Shuo Li , Lingling Li , Xu Liu , Xinyi Wang , Baoliang Chen

Other Hybrid Other

Introduces a new task for comprehensive video anomaly captioning, proposes a large-scale benchmark dataset CVACBench with fine-grained annotations, and designs a baseline model AGPFormer using prompt learning to improve anomaly understanding and description accuracy.

Aligning Effective Tokens with Video Anomaly in Large Language Models

1 October 2023·8317 words·40 mins

Yingxian Chen , Jiahui Liu , Ruidi Fan , Yanwei Li , Chirui Chang , Shizhen Zhao , Wilton W.T.Fok , Xiaojuan Qi , Yik-Chung Wu

Xd-Violence Hybrid Other

Proposes VA-GPT, a multimodal Large Language Model for video anomaly detection and understanding, utilizing effective token selection and generation modules (SETS and TETG) to improve spatial and temporal localization of anomalies. Introduces instruct-following fine-tuning data and cross-domain benchmarks for robustness evaluation.

Action Hints: Semantic Typicality and Context Uniqueness for Generalizable Skeleton-based Video Anomaly Detection

1 October 2023·7886 words·38 mins

Canhui Tang , Sanping Zhou , Haoyue Shi , Le Wang

Shanghaitech Ubnormal Ucf-Crime Hybrid Method

Proposes a zero-shot skeleton-based video anomaly detection framework leveraging action semantic typicality and context uniqueness learning, utilizing language-guided semantic modeling and test-time scene-adaptive boundaries to improve generalization without target domain training data.

A Survey on Video Anomaly Detection via Deep Learning: Human, Vehicle, and Environment

1 October 2023·18514 words·87 mins

Ghazal Alinezhad Noghre , Armin Danesh Pazho , Hamed Tabkhi

Cuhk-Avenue Shanghaitech Xd-Violence Ucf-Crime Ucsd-Ped Other Semi Supervised Unsupervised Instruction Tuning Hybrid Survey

This survey provides a comprehensive overview of deep learning-based Video Anomaly Detection (VAD), covering challenges, methodologies, domain-specific applications, and future research directions across human-centric, vehicle-centric, and environment-centric contexts. It introduces a taxonomy of supervision levels, adaptive learning strategies, and explores diverse application areas including healthcare, public safety, road surveillance, and disaster detection, emphasizing the latest advancements and open challenges.

SlowFastVAD: Video Anomaly Detection via Integrating Simple Detector and RAG-Enhanced Vision-Language Model

1 May 2023·9715 words·46 mins

Zongcan Ding , Guansong Pang , Haodong Zhang , Zhiwei Yang , Yanning Zhang , Peng Wu , Peng Wang , Jing Liu , Fang Shen , Changkang Li

Ucsd-Ped Shanghaitech Xd-Violence Ubnormal Semi Supervised Hybrid Method

Proposes a hybrid framework that integrates a fast anomaly detector with a slow, RAG-enhanced vision-language model to improve efficiency and interpretability in video anomaly detection. It employs a retrieval-augmented reasoning module for better scene-specific adaptation, uses an entropy-based intervention strategy to select ambiguous segments for slow detector analysis, and fuses outputs for robust detection.

Towards Video Anomaly Retrieval from Video Anomaly Detection: New Benchmarks and Model

1 January 2023·10197 words·48 mins

Peng Wu , Jing Liu , Xiangteng He , Yuxin Peng , Peng Wang , Yanning Zhang

Ucf-Crime Shanghaitech Hybrid Other

Proposes a new task called Video Anomaly Retrieval (VAR), introduces two large-scale benchmarks (UCFCrime-AR and XDViolence-AR), and presents a model called Anomaly-Led Alignment Network (ALAN) for VAR, focusing on retrieving long untrimmed videos using cross-modal queries such as language descriptions and synchronous audios. The work introduces anomaly-led sampling, a pretext task (VPMPM), and cross-modal alignment strategies to address the challenges of VAR in practical scenarios.

Text-Driven Traffic Anomaly Detection with Temporal High-Frequency Modeling in Driving Videos

1 January 2023·10710 words·51 mins

Rongqin Liang , Yuanman Li , Jiantao Zhou , Xia Li

Cuhk-Avenue Shanghaitech Hybrid Other

Introduces a novel single-stage approach (TTHF) for traffic anomaly detection that aligns video clips with text prompts and models high-frequency temporal changes, enhanced by an attention focusing mechanism, outperforming state-of-the-art methods on benchmark datasets.

↑