Ubnormal

2025

Networking Systems for Video Anomaly Detection: A Tutorial and Survey

1 April 2025·21983 words·104 mins

Jing Liu , Yang Liu , Jieyu Lin , Jielin Li , Liang Cao , Peng Sun , Bo Hu , Liang Song , Azzedine Boukerche , Victor C.M. Leung

Cuhk-Avenue Shanghaitech Xd-Violence Ubnormal Ucf-Crime Ucsd-Ped Hybrid Survey

A comprehensive survey and tutorial exploring the assumptions, frameworks, recent advances, applications, and future trends of Networking Systems for Video Anomaly Detection (NSVAD), emphasizing the integration of AI, IoVT, and computing for real-world deployable systems.

Personalizing Vision-Language Models With Hybrid Prompts for Zero-Shot Anomaly Detection

13 February 2025·8885 words·42 mins

Yunkang Cao , Xiaohao Xu , Yuqi Cheng , Chen Sun , Zongwei Du , Liang Gao , Weiming Shen

Cuhk-Avenue Shanghaitech Xd-Violence Ubnormal Ucf-Crime Ucsd-Ped Other Weakly Supervised Semi Supervised Training Free Instruction Tuning Unsupervised Hybrid Other

Introduces AnomalyVLM, a framework leveraging hybrid prompts derived from prior knowledge to enhance zero-shot anomaly detection by personalizing vision-language models, incorporating an anomaly region generator and refiner, and utilizing hybrid prompts for category-specific customization and improved detection performance.

PLOVAD: Prompting Vision-Language Models for Open Vocabulary Video Anomaly Detection

10 January 2025·10371 words·49 mins

Chenting Xu , Ke Xu , Xinghao Jiang , Tanfeng Sun

Ucf-Crime Shanghaitech Xd-Violence Ubnormal Weakly Supervised Instruction Tuning Unsupervised Hybrid Method

A novel framework (PLOVAD) leveraging prompt tuning on large-scale pretrained image-based vision-language models for open vocabulary video anomaly detection, incorporating domain-specific and anomaly-specific prompts, and a temporal module to detect and categorize both seen and unseen anomalies with limited parameters.

2024

Text-Driven Traffic Anomaly Detection With Temporal High-Frequency Modeling in Driving Videos

17 April 2024·10204 words·48 mins

Rongqin Liang , Yuanman Li , Jiantao Zhou , Xia Li

Cuhk-Avenue Shanghaitech Xd-Violence Ubnormal Ucf-Crime Ucsd-Ped Other Hybrid Other

The paper introduces TTHF, a novel single-stage method aligning video clips with text prompts for traffic anomaly detection. It emphasizes modeling high frequency in the temporal domain to capture dynamic changes in driving scenes, and proposes an attentive anomaly focusing mechanism to enhance detection of various traffic anomalies. The approach leverages visual-text semantic alignment, modeling temporal high frequency, and guided attention mechanisms, achieving superior performance on benchmark datasets.

CALLM: Cascading Autoencoder and Large Language Model for Video Anomaly Detection

1 January 2024·3578 words·17 mins

Apostolos Ntelopoulos , Kamal Nasrollahi

Cuhk-Avenue Shanghaitech Ucf-Crime Ubnormal Weakly Supervised Method

This paper introduces a novel cascade system combining a 3D Autoencoder with a Large Visual Language Model (LVLM) for video anomaly detection, leveraging weak supervision and multimodal capabilities to improve detection and explanation of abnormalities.

2023

AnyAnomaly: Zero-Shot Customizable Video Anomaly Detection with LVLM

1 December 2023·8129 words·39 mins

Sunghyun Ahn , Youngwan Jo , Kijung Lee , Sein Kwon , Inpyo Hong , Sanghyun Park

Ubnormal Hybrid Method

Proposes the AnyAnomaly model utilizing large vision language models (LVLMs) for zero-shot, customizable video anomaly detection that detects user-defined anomalies without additional training, incorporating segment-level processing and context-aware visual question answering (VQA). The approach enhances generalization across diverse environments and achieves state-of-the-art results on benchmark datasets, demonstrating practical potential for real-world applications.

Vad-R1: Towards Video Anomaly Reasoning via Perception-to-Cognition Chain-of-Thought

1 October 2023·11169 words·53 mins

Chao Huang , Benfeng Wang , Jie Wen , Chengliang Liu , Wei Wang , Li Shen , Xiaochun Cao

Shanghaitech Xd-Violence Ubnormal Ucf-Crime Ucsd-Ped Other Hybrid Method

Proposes a structured Perception-to-Cognition Chain-of-Thought and introduces Vad-Reasoning dataset, along with an improved reinforcement learning algorithm AVA-GRPO, to enhance the deep reasoning capabilities of Multimodal Large Language Models in video anomaly detection and understanding tasks.

Open-Vocabulary Video Anomaly Detection

1 October 2023·7786 words·37 mins

Peng Wu , Xuerong Zhou , Guansong Pang , Yujia Sun , Jing Liu , Peng Wang , Yanning Zhang

Ucf-Crime Xd-Violence Ubnormal Hybrid Other

This paper explores open-vocabulary video anomaly detection (OVVAD) leveraging pre-trained large models to detect and categorize seen and unseen anomalies. It proposes a disentangled approach with class-agnostic detection and class-specific classification modules, enhanced by semantic knowledge injection, anomaly synthesis, and joint optimization, to achieve state-of-the-art performance.

Learning to Understand Open-World Video Anomalies

1 October 2023·11409 words·54 mins

Jiaqi Tang , Hao Lu , Ruizheng Wu , Xiaogang Xu , Ke Ma , Cheng Fang , Bin Guo , Jiangbo Lu , Qifeng Chen , Ying-Cong Chen

Shanghaitech Cuhk-Avenue Xd-Violence Ubnormal Ucf-Crime Ucsd-Ped Other Hybrid Other

Introduces HAWK, a novel framework leveraging interactive large Visual Language Models with explicit and implicit motion modality integration, auxiliary consistency loss, and detailed language annotations for diverse video anomaly scenarios. Demonstrates state-of-the-art performance in video description and question-answering tasks across multiple open-world datasets, with extensive annotated data and generation pipelines to enhance practical anomaly understanding and interaction capabilities.

Language-guided Open-world Video Anomaly Detection

1 October 2023·6686 words·32 mins

Zihao Liu , Xiaoyu Wu , Jianqin Wu , Xuxu Wang , Linlin Yang

Ucf-Crime Xd-Violence Ubnormal Ucsd-Ped Other Semi Supervised Unsupervised Hybrid Application

Proposes a novel open-world VAD paradigm guided by natural language, with a dynamic anomaly definition, regularization strategies, and a large-scale dataset (PreVAD) with multi-level annotations and descriptions. Achieves state-of-the-art zero-shot performance on seven datasets.

Action Hints: Semantic Typicality and Context Uniqueness for Generalizable Skeleton-based Video Anomaly Detection

1 October 2023·7886 words·38 mins

Canhui Tang , Sanping Zhou , Haoyue Shi , Le Wang

Shanghaitech Ubnormal Ucf-Crime Semi Supervised Instruction Tuning Method

Proposes a zero-shot skeleton-based video anomaly detection framework utilizing action semantic typicality and context uniqueness learning, involving a language-guided typicality modeling module and a test-time context uniqueness analysis module, achieving state-of-the-art results without target domain training data.

Action Hints: Semantic Typicality and Context Uniqueness for Generalizable Skeleton-based Video Anomaly Detection

1 October 2023·7886 words·38 mins

Canhui Tang , Sanping Zhou , Haoyue Shi , Le Wang

Shanghaitech Ubnormal Ucf-Crime Hybrid Method

Proposes a zero-shot skeleton-based video anomaly detection framework leveraging action semantic typicality and context uniqueness learning, utilizing language-guided semantic modeling and test-time scene-adaptive boundaries to improve generalization without target domain training data.

SlowFastVAD: Video Anomaly Detection via Integrating Simple Detector and RAG-Enhanced Vision-Language Model

1 May 2023·9715 words·46 mins

Zongcan Ding , Guansong Pang , Haodong Zhang , Zhiwei Yang , Yanning Zhang , Peng Wu , Peng Wang , Jing Liu , Fang Shen , Changkang Li

Ucsd-Ped Shanghaitech Xd-Violence Ubnormal Semi Supervised Hybrid Method

Proposes a hybrid framework that integrates a fast anomaly detector with a slow, RAG-enhanced vision-language model to improve efficiency and interpretability in video anomaly detection. It employs a retrieval-augmented reasoning module for better scene-specific adaptation, uses an entropy-based intervention strategy to select ambiguous segments for slow detector analysis, and fuses outputs for robust detection.

↑