Shanghaitech on sis-arxiv-vad-papers

Shanghaitech on sis-arxiv-vad-papershttps://phuchoang2603.github.io/sis-arxiv-vad-papers/benchmarks/shanghaitech/Recent content in Shanghaitech on sis-arxiv-vad-papersHugo -- gohugo.ioenMon, 28 Apr 2025 00:00:00 +0000Sherlock: Towards Multi-scene Video Abnormal Event Extraction and Localization via a Global-local Spatial-sensitive LLMhttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/sherlock-towards-multi-scene-video-abnormal-event-extraction-and-localization-via-a-global-local-spatial-sensitive-llm/Mon, 28 Apr 2025 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/sherlock-towards-multi-scene-video-abnormal-event-extraction-and-localization-via-a-global-local-spatial-sensitive-llm/Proposes a new task (M-VAE) for structured extraction and localization of abnormal events in videos, introduces Sherlock model with a Global-local Spatial-sensitive MoE module and a Spatial Imbalance Regulator, and demonstrates its effectiveness through extensive experiments.Networking Systems for Video Anomaly Detection: A Tutorial and Surveyhttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/survey-4/Tue, 01 Apr 2025 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/survey-4/A comprehensive survey and tutorial exploring the assumptions, frameworks, recent advances, applications, and future trends of Networking Systems for Video Anomaly Detection (NSVAD), emphasizing the integration of AI, IoVT, and computing for real-world deployable systems.Personalizing Vision-Language Models With Hybrid Prompts for Zero-Shot Anomaly Detectionhttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/personalizing_vision-language_models_with_hybrid_prompts_for_zero-shot_anomaly_detection/Thu, 13 Feb 2025 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/personalizing_vision-language_models_with_hybrid_prompts_for_zero-shot_anomaly_detection/Introduces AnomalyVLM, a framework leveraging hybrid prompts derived from prior knowledge to enhance zero-shot anomaly detection by personalizing vision-language models, incorporating an anomaly region generator and refiner, and utilizing hybrid prompts for category-specific customization and improved detection performance.PLOVAD: Prompting Vision-Language Models for Open Vocabulary Video Anomaly Detectionhttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/plovad_prompting_vision-language_models_for_open_vocabulary_video_anomaly_detection/Fri, 10 Jan 2025 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/plovad_prompting_vision-language_models_for_open_vocabulary_video_anomaly_detection/A novel framework (PLOVAD) leveraging prompt tuning on large-scale pretrained image-based vision-language models for open vocabulary video anomaly detection, incorporating domain-specific and anomaly-specific prompts, and a temporal module to detect and categorize both seen and unseen anomalies with limited parameters.Text-Driven Traffic Anomaly Detection With Temporal High-Frequency Modeling in Driving Videoshttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/text-driven_traffic_anomaly_detection_with_temporal_high-frequency_modeling_in_driving_videos/Wed, 17 Apr 2024 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/text-driven_traffic_anomaly_detection_with_temporal_high-frequency_modeling_in_driving_videos/The paper introduces TTHF, a novel single-stage method aligning video clips with text prompts for traffic anomaly detection. It emphasizes modeling high frequency in the temporal domain to capture dynamic changes in driving scenes, and proposes an attentive anomaly focusing mechanism to enhance detection of various traffic anomalies. The approach leverages visual-text semantic alignment, modeling temporal high frequency, and guided attention mechanisms, achieving superior performance on benchmark datasets.CALLM: Cascading Autoencoder and Large Language Model for Video Anomaly Detectionhttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/callm_cascading_autoencoder_and_large_language_model_for_video_anomaly_detection/Mon, 01 Jan 2024 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/callm_cascading_autoencoder_and_large_language_model_for_video_anomaly_detection/This paper introduces a novel cascade system combining a 3D Autoencoder with a Large Visual Language Model (LVLM) for video anomaly detection, leveraging weak supervision and multimodal capabilities to improve detection and explanation of abnormalities.VLAVAD: Vision-Language Models Assisted Unsupervised Video Anomaly Detectionhttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/vlavad-vision-language-models-assisted-unsupervised-vad/Mon, 01 Jan 2024 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/vlavad-vision-language-models-assisted-unsupervised-vad/Proposes VLAVAD, an unsupervised video anomaly detection method leveraging vision-language pre-trained models, utilizing semantic features, Selective Prompt Adapter, and Sequence State Space Module to improve interpretability and transferability, achieving state-of-the-art performance on the ShanghaiTech dataset.A Survey on Video Anomaly Detection via Deep Learning: Human, Vehicle, and Environmenthttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/survey-3/Sun, 01 Oct 2023 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/survey-3/This survey provides a comprehensive overview of deep learning-based Video Anomaly Detection (VAD), covering challenges, methodologies, domain-specific applications, and future research directions across human-centric, vehicle-centric, and environment-centric contexts. It introduces a taxonomy of supervision levels, adaptive learning strategies, and explores diverse application areas including healthcare, public safety, road surveillance, and disaster detection, emphasizing the latest advancements and open challenges.Action Hints: Semantic Typicality and Context Uniqueness for Generalizable Skeleton-based Video Anomaly Detectionhttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/action-hints-semantic-typicality-and-context-uniqueness-for/Sun, 01 Oct 2023 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/action-hints-semantic-typicality-and-context-uniqueness-for/Proposes a zero-shot skeleton-based video anomaly detection framework leveraging action semantic typicality and context uniqueness learning, utilizing language-guided semantic modeling and test-time scene-adaptive boundaries to improve generalization without target domain training data.Action Hints: Semantic Typicality and Context Uniqueness for Generalizable Skeleton-based Video Anomaly Detectionhttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/typicality-and-context-uniqueness-for/Sun, 01 Oct 2023 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/typicality-and-context-uniqueness-for/Proposes a zero-shot skeleton-based video anomaly detection framework utilizing action semantic typicality and context uniqueness learning, involving a language-guided typicality modeling module and a test-time context uniqueness analysis module, achieving state-of-the-art results without target domain training data.An Attribute-based Method for Video Anomaly Detectionhttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/an-attribute-based-method-for-video-anomaly-detection/Sun, 01 Oct 2023 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/an-attribute-based-method-for-video-anomaly-detection/A simple attribute-based approach that represents each object by velocity and pose attributes, combining these with deep representations, and uses density estimation for anomaly scoring, achieving state-of-the-art performance.AVadCLIP: Audio-Visual Collaboration for Robust Video Anomaly Detectionhttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/avadclip-audio-visual-collaboration-for-robust-video-anomaly-detection/Sun, 01 Oct 2023 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/avadclip-audio-visual-collaboration-for-robust-video-anomaly-detection/A novel weakly supervised framework leveraging audio-visual collaboration to improve the robustness and accuracy of video anomaly detection.CLIP: Assisted Video Anomaly Detectionhttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/clip-assisted/Sun, 01 Oct 2023 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/clip-assisted/Proposes a generalized framework for video anomaly detection based on CLIP, introducing generative anomaly descriptions, temporal modules for capturing temporal correlations, and object-centric approaches to improve performance and robustness, with extensive experimentation on UCF-Crime and ShanghaiTech datasets.Follow the Rules: Reasoning for Video Anomaly Detection with Large Language Modelshttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/follow-the-rules-reasonin-for-vad-with-llm/Sun, 01 Oct 2023 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/follow-the-rules-reasonin-for-vad-with-llm/Proposes a rule-based reasoning framework, AnomalyRuler, for video anomaly detection using large language models, enabling fast scenario adaptation with few-normal-shot prompting and enhanced robustness through strategic modules.Holmes-VAD: Towards Unbiased and Explainable Video Anomaly Detection via Multi-modal LLMhttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/holmes-vad-towards-unbiased-and-explainable-video-anomaly-detection-via-multi-modal-llm/Sun, 01 Oct 2023 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/holmes-vad-towards-unbiased-and-explainable-video-anomaly-detection-via-multi-modal-llm/A novel framework leveraging multimodal instructions and large-scale datasets to enable unbiased, interpretable, and accurate video anomaly detection with large language models, including a new dataset VAD-Instruct50k with single-frame annotations and explanatory instruction data.Learning Suspected Anomalies from Event Prompts for Video Anomaly Detectionhttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/learning-suspected-anomalies-from-event-prompts/Sun, 01 Oct 2023 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/learning-suspected-anomalies-from-event-prompts/Proposes a novel framework named LAP that leverages textual event prompts and semantic similarity for weakly supervised video anomaly detection. It introduces a multi-prompt learning process, pseudo anomaly labeling, and integrates semantic features derived from a prompt dictionary to guide the detection model, resulting in improved performance across multiple datasets.Learning to Understand Open-World Video Anomalieshttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/hawk--learning-to-understand-open-world-video-anomalies/Sun, 01 Oct 2023 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/hawk--learning-to-understand-open-world-video-anomalies/Introduces HAWK, a novel framework leveraging interactive large Visual Language Models with explicit and implicit motion modality integration, auxiliary consistency loss, and detailed language annotations for diverse video anomaly scenarios. Demonstrates state-of-the-art performance in video description and question-answering tasks across multiple open-world datasets, with extensive annotated data and generation pipelines to enhance practical anomaly understanding and interaction capabilities.SUVAD: Semantic Understanding Based Video Anomaly Detection Using MLLMhttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/suvad_semantic_understanding_based_video_anomaly_detection_using_mllm/Sun, 01 Oct 2023 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/suvad_semantic_understanding_based_video_anomaly_detection_using_mllm/Proposes a training-free video anomaly detection method leveraging multi-modal large language models for semantic understanding of videos, enabling scene generalization, interpretability, and flexible anomaly definition without retraining.Vad-R1: Towards Video Anomaly Reasoning via Perception-to-Cognition Chain-of-Thoughthttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/vad-r1-towards-video-anomaly-reasoning-via/Sun, 01 Oct 2023 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/vad-r1-towards-video-anomaly-reasoning-via/Proposes a structured Perception-to-Cognition Chain-of-Thought and introduces Vad-Reasoning dataset, along with an improved reinforcement learning algorithm AVA-GRPO, to enhance the deep reasoning capabilities of Multimodal Large Language Models in video anomaly detection and understanding tasks.VADSK: VIDEO ANOMALY DETECTION WITH STRUCTURED KEYWORDShttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/vadsk-video-anomaly-detection-with-structured/Sun, 01 Oct 2023 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/vadsk-video-anomaly-detection-with-structured/A lightweight, interpretable, two-stage video anomaly detection pipeline employing foundational models for frame description generation and keyword-based classification, achieving comparable performance to state-of-the-art methods with real-time inference and enhanced interpretability.VAU-R1: Advancing Video Anomaly Understanding via Reinforcement Fine-Tuninghttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/vau-r1-advancing-video-anomaly-understanding-via-reinforcement-fine-tuning/Sun, 01 Oct 2023 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/vau-r1-advancing-video-anomaly-understanding-via-reinforcement-fine-tuning/Introduces VAU-R1, a reinforcement fine-tuning framework leveraging Group Relative Policy Optimization (GRPO) to enhance multimodal large language models’ (MLLMs) reasoning capabilities in video anomaly understanding (VAU). Develops VAUBench, a comprehensive Chain-of-Thought benchmark with rich annotations across perception, grounding, reasoning, and classification tasks, supported by multiple evaluation metrics including VAU-Eval, QA accuracy, temporal IoU, and Factual Consistency. Demonstrates significant improvements over supervised fine-tuning in question answering accuracy, temporal localization, and interpretability, thereby establishing a scalable, interpretable, and reasoning-aware VAU framework.Video Anomaly Detection in 10 Years: A Survey and Outlookhttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/survey-2/Sun, 01 Oct 2023 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/survey-2/A comprehensive survey exploring deep learning-based video anomaly detection, including emerging paradigms such as weakly supervised, self-supervised, and unsupervised approaches, with a focus on core challenges, feature extraction, supervision schemes, loss functions, regularization techniques, and the potential of vision-language models (VLMs) for enhanced anomaly detection.Advanced Video Anomaly Detection Using Deep Learninghttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/vane-bench-video-anomaly-evaluation/Sat, 15 Jul 2023 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/vane-bench-video-anomaly-evaluation/This paper introduces a novel deep learning framework for detecting anomalies in video content by leveraging semi-supervised approaches that require minimal labeled data, enhancing robustness and efficiency.SlowFastVAD: Video Anomaly Detection via Integrating Simple Detector and RAG-Enhanced Vision-Language Modelhttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/slowfastvad-video-anomaly-detection-via-integrating-simpledetector-and-rag-enhanced-vision-language-model/Mon, 01 May 2023 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/slowfastvad-video-anomaly-detection-via-integrating-simpledetector-and-rag-enhanced-vision-language-model/Proposes a hybrid framework that integrates a fast anomaly detector with a slow, RAG-enhanced vision-language model to improve efficiency and interpretability in video anomaly detection. It employs a retrieval-augmented reasoning module for better scene-specific adaptation, uses an entropy-based intervention strategy to select ambiguous segments for slow detector analysis, and fuses outputs for robust detection.Delving into CLIP latent space for Video Anomaly Recognitionhttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/delving-into-clip-latent-space-for-video-anomaly-recognition/Sun, 01 Jan 2023 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/delving-into-clip-latent-space-for-video-anomaly-recognition/Proposes AnomalyCLIP, a novel method leveraging Large Language and Vision (LLV) models like CLIP, combined with multiple instance learning and a re-centring transformation of the CLIP feature space, to detect and classify video anomalies and recognize anomaly types. Introduces a Selector model with prompt learning and a Temporal Transformer-based model for temporal dependency modeling; demonstrates state-of-the-art performance on multiple benchmarks.Generating Anomalies for Video Anomaly Detection with Prompt-based Feature Mappinghttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/liu_generating_anomalies_for_video_anomaly_detection_with_prompt-based_feature_mapping_cvpr_2023_paper/Sun, 01 Jan 2023 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/liu_generating_anomalies_for_video_anomaly_detection_with_prompt-based_feature_mapping_cvpr_2023_paper/The paper proposes a prompt-based feature mapping framework (PFMF) to generate unseen anomalies with unbounded types and narrow the scene gap for video anomaly detection, outperforming state-of-the-art methods on multiple datasets.Hierarchical Semantic Contrast for Scene-aware Video Anomaly Detectionhttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/hierarchical-semantic-contrast-for-scene-aware-video-anomaly-detection/Sun, 01 Jan 2023 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/hierarchical-semantic-contrast-for-scene-aware-video-anomaly-detection/The paper proposes a hierarchical semantic contrast (HSC) method that leverages scene-aware autoencoders, semantic contrastive learning, and motion augmentation for improved scene-dependent and scene-independent video anomaly detection. It incorporates pre-trained video parsing models, hierarchical contrastive learning at scene and object levels, and skeleton-based motion augmentation to make the normal feature representations more compact and discriminative, thereby enhancing anomaly detection performance.TEVAD: Improved video anomaly detection with captionshttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/chen_tevad_improved_video_anomaly_detection_with_captions_cvprw_2023_paper/Sun, 01 Jan 2023 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/chen_tevad_improved_video_anomaly_detection_with_captions_cvprw_2023_paper/Proposes a framework that utilizes both visual and text features, generated through dense video captions, to enhance anomaly detection performance and explainability in videos.Text-Driven Traffic Anomaly Detection with Temporal High-Frequency Modeling in Driving Videoshttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/text-driven-traffic-anomaly-detection-with-temporal-high-frequency/Sun, 01 Jan 2023 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/text-driven-traffic-anomaly-detection-with-temporal-high-frequency/Introduces a novel single-stage approach (TTHF) for traffic anomaly detection that aligns video clips with text prompts and models high-frequency temporal changes, enhanced by an attention focusing mechanism, outperforming state-of-the-art methods on benchmark datasets.Towards Video Anomaly Retrieval from Video Anomaly Detection: New Benchmarks and Modelhttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/toward-video-anomaly-retrieval-from-video/Sun, 01 Jan 2023 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/toward-video-anomaly-retrieval-from-video/Proposes a new task called Video Anomaly Retrieval (VAR), introduces two large-scale benchmarks (UCFCrime-AR and XDViolence-AR), and presents a model called Anomaly-Led Alignment Network (ALAN) for VAR, focusing on retrieving long untrimmed videos using cross-modal queries such as language descriptions and synchronous audios. The work introduces anomaly-led sampling, a pretext task (VPMPM), and cross-modal alignment strategies to address the challenges of VAR in practical scenarios.TransAnomaly: Video Anomaly Detection Using Video Vision Transformerhttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/transanomaly_video_anomaly_detection_using_video_vision_transformer/Mon, 30 Aug 2021 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/transanomaly_video_anomaly_detection_using_video_vision_transformer/A prediction-based video anomaly detection approach combining U-Net and Video Vision Transformer (ViViT), with modifications for video prediction, capturing richer temporal and global context information, enabling anomaly localization.