Skip to main content

Ucf-Crime

2023

Language-guided Open-world Video Anomaly Detection

Proposes a novel open-world VAD paradigm guided by natural language, with a dynamic anomaly definition, regularization strategies, and a large-scale dataset (PreVAD) with multi-level annotations and descriptions. Achieves state-of-the-art zero-shot performance on seven datasets.

Holmes-VAU: Towards Long-term Video Anomaly Understanding at Any Granularity

A semi-automated hierarchical video annotation framework combined with a novel Anomaly-focused Temporal Sampler and a multimodal large language model, aimed at comprehensive understanding of complex and long-term video anomalies across multiple temporal scales.

Holmes-VAD: Towards Unbiased and Explainable Video Anomaly Detection via Multi-modal LLM

A novel framework leveraging multimodal instructions and large-scale datasets to enable unbiased, interpretable, and accurate video anomaly detection with large language models, including a new dataset VAD-Instruct50k with single-frame annotations and explanatory instruction data.

Harnessing Large Language Models for Training-free Video Anomaly Detection

Introduces a training-free method for video anomaly detection (VAD) leveraging pre-trained large language models (LLMs) and vision-language models (VLMs). Proposes techniques for caption cleaning, scene description, and anomaly scoring without additional training, demonstrating superior performance on surveillance datasets.

CLIP: Assisted Video Anomaly Detection

·6463 words·31 mins
Proposes a generalized framework for video anomaly detection based on CLIP, introducing generative anomaly descriptions, temporal modules for capturing temporal correlations, and object-centric approaches to improve performance and robustness, with extensive experimentation on UCF-Crime and ShanghaiTech datasets.

Anomize: Better Open Vocabulary Video Anomaly Detection

The paper introduces the Anomize framework that addresses detection ambiguity and categorization confusion in open vocabulary video anomaly detection (OVVAD) by leveraging visual and textual data augmentation, dual-stream mechanisms, and label relation guidance, achieving superior performance on multiple datasets.

Action Hints: Semantic Typicality and Context Uniqueness for Generalizable Skeleton-based Video Anomaly Detection

Proposes a zero-shot skeleton-based video anomaly detection framework utilizing action semantic typicality and context uniqueness learning, involving a language-guided typicality modeling module and a test-time context uniqueness analysis module, achieving state-of-the-art results without target domain training data.

Action Hints: Semantic Typicality and Context Uniqueness for Generalizable Skeleton-based Video Anomaly Detection

Proposes a zero-shot skeleton-based video anomaly detection framework leveraging action semantic typicality and context uniqueness learning, utilizing language-guided semantic modeling and test-time scene-adaptive boundaries to improve generalization without target domain training data.

A Survey on Video Anomaly Detection via Deep Learning: Human, Vehicle, and Environment

This survey provides a comprehensive overview of deep learning-based Video Anomaly Detection (VAD), covering challenges, methodologies, domain-specific applications, and future research directions across human-centric, vehicle-centric, and environment-centric contexts. It introduces a taxonomy of supervision levels, adaptive learning strategies, and explores diverse application areas including healthcare, public safety, road surveillance, and disaster detection, emphasizing the latest advancements and open challenges.

Towards Video Anomaly Retrieval from Video Anomaly Detection: New Benchmarks and Model

Proposes a new task called Video Anomaly Retrieval (VAR), introduces two large-scale benchmarks (UCFCrime-AR and XDViolence-AR), and presents a model called Anomaly-Led Alignment Network (ALAN) for VAR, focusing on retrieving long untrimmed videos using cross-modal queries such as language descriptions and synchronous audios. The work introduces anomaly-led sampling, a pretext task (VPMPM), and cross-modal alignment strategies to address the challenges of VAR in practical scenarios.

Delving into CLIP latent space for Video Anomaly Recognition

Proposes AnomalyCLIP, a novel method leveraging Large Language and Vision (LLV) models like CLIP, combined with multiple instance learning and a re-centring transformation of the CLIP feature space, to detect and classify video anomalies and recognize anomaly types. Introduces a Selector model with prompt learning and a Temporal Transformer-based model for temporal dependency modeling; demonstrates state-of-the-art performance on multiple benchmarks.

2021