Other on sis-arxiv-vad-papers

Other on sis-arxiv-vad-papershttps://phuchoang2603.github.io/sis-arxiv-vad-papers/benchmarks/other/Recent content in Other on sis-arxiv-vad-papersHugo -- gohugo.ioenMon, 28 Apr 2025 00:00:00 +0000Sherlock: Towards Multi-scene Video Abnormal Event Extraction and Localization via a Global-local Spatial-sensitive LLMhttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/sherlock-towards-multi-scene-video-abnormal-event-extraction-and-localization-via-a-global-local-spatial-sensitive-llm/Mon, 28 Apr 2025 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/sherlock-towards-multi-scene-video-abnormal-event-extraction-and-localization-via-a-global-local-spatial-sensitive-llm/Proposes a new task (M-VAE) for structured extraction and localization of abnormal events in videos, introduces Sherlock model with a Global-local Spatial-sensitive MoE module and a Spatial Imbalance Regulator, and demonstrates its effectiveness through extensive experiments.Personalizing Vision-Language Models With Hybrid Prompts for Zero-Shot Anomaly Detectionhttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/personalizing_vision-language_models_with_hybrid_prompts_for_zero-shot_anomaly_detection/Thu, 13 Feb 2025 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/personalizing_vision-language_models_with_hybrid_prompts_for_zero-shot_anomaly_detection/Introduces AnomalyVLM, a framework leveraging hybrid prompts derived from prior knowledge to enhance zero-shot anomaly detection by personalizing vision-language models, incorporating an anomaly region generator and refiner, and utilizing hybrid prompts for category-specific customization and improved detection performance.Text-Driven Traffic Anomaly Detection With Temporal High-Frequency Modeling in Driving Videoshttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/text-driven_traffic_anomaly_detection_with_temporal_high-frequency_modeling_in_driving_videos/Wed, 17 Apr 2024 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/text-driven_traffic_anomaly_detection_with_temporal_high-frequency_modeling_in_driving_videos/The paper introduces TTHF, a novel single-stage method aligning video clips with text prompts for traffic anomaly detection. It emphasizes modeling high frequency in the temporal domain to capture dynamic changes in driving scenes, and proposes an attentive anomaly focusing mechanism to enhance detection of various traffic anomalies. The approach leverages visual-text semantic alignment, modeling temporal high frequency, and guided attention mechanisms, achieving superior performance on benchmark datasets.SmartHome-Bench: A Comprehensive Benchmark for Video Anomaly Detection in Smart Homes Using Multi-Modal Large Language Modelshttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/smarthome-bench-a-comprehensive-benchmark-for-video-anomaly-detection-in-smart-homes-using-multi-modal-large-language-models/Thu, 05 Oct 2023 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/smarthome-bench-a-comprehensive-benchmark-for-video-anomaly-detection-in-smart-homes-using-multi-modal-large-language-models/The paper introduces SmartHome-Bench, the first large-scale dataset and benchmark designed specifically for video anomaly detection (VAD) within smart home environments, incorporating 1,203 annotated videos across seven categories such as Wildlife, Senior Care, and Baby Monitoring. The dataset includes detailed annotations with anomaly tags, descriptions, and rationales, facilitating research on multi-modal large language models (MLLMs) for explainable VAD. It evaluates various adaptation methods, including prompting strategies and a novel taxonomy-driven reflective LLM chain (TRLC), demonstrating significant performance improvements and highlighting current model limitations. The study aims to advance smart home security by providing a dedicated benchmark and novel framework for enhancing MLLM-based anomaly detection and reasoning.A Survey on Video Anomaly Detection via Deep Learning: Human, Vehicle, and Environmenthttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/survey-3/Sun, 01 Oct 2023 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/survey-3/This survey provides a comprehensive overview of deep learning-based Video Anomaly Detection (VAD), covering challenges, methodologies, domain-specific applications, and future research directions across human-centric, vehicle-centric, and environment-centric contexts. It introduces a taxonomy of supervision levels, adaptive learning strategies, and explores diverse application areas including healthcare, public safety, road surveillance, and disaster detection, emphasizing the latest advancements and open challenges.A VLM-based Method for Visual Anomaly Detection in Robotic Scientific Laboratorieshttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/a-vlm-based-method-for-visual-anomaly-detection-in-robotic-scientific-laboratories/Sun, 01 Oct 2023 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/a-vlm-based-method-for-visual-anomaly-detection-in-robotic-scientific-laboratories/Proposes a vision-language reasoning approach utilizing hierarchical prompts and Chain-of-Thought inference for process anomaly detection in scientific experiments. Constructs a benchmark based on real chemical laboratory workflows and demonstrates improved accuracy with prompt granularity, validated through real-world robotic lab testing.Anomaly-Led Prompting Learning Caption Generating Model and Benchmarkhttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/anomaly-led_prompting_learning_caption_generating_model_and_benchmark/Sun, 01 Oct 2023 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/anomaly-led_prompting_learning_caption_generating_model_and_benchmark/Introduces a new task for comprehensive video anomaly captioning, proposes a large-scale benchmark dataset CVACBench with fine-grained annotations, and designs a baseline model AGPFormer using prompt learning to improve anomaly understanding and description accuracy.AssistPDA: An Online Video Surveillance Assistant for Video Anomaly Prediction, Detection, and Analysishttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/assistpda-an-online-video-surveillance-assistant-for-video-anomaly-prediction-detection-and-analysis/Sun, 01 Oct 2023 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/assistpda-an-online-video-surveillance-assistant-for-video-anomaly-prediction-detection-and-analysis/Introducing AssistPDA, a pioneering framework for real-time online video anomaly prediction, detection, and analysis leveraging vision-language models with a novel spatiotemporal relation distillation module and constructed benchmark dataset VAPDA-127K.Exploring Large Vision-Language Models for Robust and Efficient Industrial Anomaly Detectionhttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/exploring-large-vision-language-models-for-robust-and-efficient-industrial-anomaly-detection/Sun, 01 Oct 2023 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/exploring-large-vision-language-models-for-robust-and-efficient-industrial-anomaly-detection/Proposes a novel approach (CLAD) leveraging large vision-language models with contrastive cross-modal training for improved industrial anomaly detection and localization, enhancing interpretability and robustness.Follow the Rules: Reasoning for Video Anomaly Detection with Large Language Modelshttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/follow-the-rules-reasonin-for-vad-with-llm/Sun, 01 Oct 2023 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/follow-the-rules-reasonin-for-vad-with-llm/Proposes a rule-based reasoning framework, AnomalyRuler, for video anomaly detection using large language models, enabling fast scenario adaptation with few-normal-shot prompting and enhanced robustness through strategic modules.Holmes-VAU: Towards Long-term Video Anomaly Understanding at Any Granularityhttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/holmes-vau-towards-long-term-video-anomaly-understanding-at-any-granularity/Sun, 01 Oct 2023 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/holmes-vau-towards-long-term-video-anomaly-understanding-at-any-granularity/A semi-automated hierarchical video annotation framework combined with a novel Anomaly-focused Temporal Sampler and a multimodal large language model, aimed at comprehensive understanding of complex and long-term video anomalies across multiple temporal scales.Language-guided Open-world Video Anomaly Detectionhttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/language-guided-open-world-vad/Sun, 01 Oct 2023 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/language-guided-open-world-vad/Proposes a novel open-world VAD paradigm guided by natural language, with a dynamic anomaly definition, regularization strategies, and a large-scale dataset (PreVAD) with multi-level annotations and descriptions. Achieves state-of-the-art zero-shot performance on seven datasets.Learning to Understand Open-World Video Anomalieshttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/hawk--learning-to-understand-open-world-video-anomalies/Sun, 01 Oct 2023 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/hawk--learning-to-understand-open-world-video-anomalies/Introduces HAWK, a novel framework leveraging interactive large Visual Language Models with explicit and implicit motion modality integration, auxiliary consistency loss, and detailed language annotations for diverse video anomaly scenarios. Demonstrates state-of-the-art performance in video description and question-answering tasks across multiple open-world datasets, with extensive annotated data and generation pipelines to enhance practical anomaly understanding and interaction capabilities.Simplifying Traffic Anomaly Detection with Video Foundation Modelshttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/simplifying-traffic-anomaly-detection-with-video-foundation-models/Sun, 01 Oct 2023 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/simplifying-traffic-anomaly-detection-with-video-foundation-models/The paper investigates the use of simple encoder-only Video Vision Transformers (Video ViTs) with various pre-training strategies for traffic anomaly detection (TAD), demonstrating that with strong pretraining and domain adaptation, minimal architectural complexity can outperform complex prior methods, highlighting the importance of pretraining strategies like Masked Video Modeling (MVM).SUVAD: Semantic Understanding Based Video Anomaly Detection Using MLLMhttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/suvad_semantic_understanding_based_video_anomaly_detection_using_mllm/Sun, 01 Oct 2023 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/suvad_semantic_understanding_based_video_anomaly_detection_using_mllm/Proposes a training-free video anomaly detection method leveraging multi-modal large language models for semantic understanding of videos, enabling scene generalization, interpretability, and flexible anomaly definition without retraining.Vad-R1: Towards Video Anomaly Reasoning via Perception-to-Cognition Chain-of-Thoughthttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/vad-r1-towards-video-anomaly-reasoning-via/Sun, 01 Oct 2023 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/vad-r1-towards-video-anomaly-reasoning-via/Proposes a structured Perception-to-Cognition Chain-of-Thought and introduces Vad-Reasoning dataset, along with an improved reinforcement learning algorithm AVA-GRPO, to enhance the deep reasoning capabilities of Multimodal Large Language Models in video anomaly detection and understanding tasks.VAU-R1: Advancing Video Anomaly Understanding via Reinforcement Fine-Tuninghttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/vau-r1-advancing-video-anomaly-understanding-via-reinforcement-fine-tuning/Sun, 01 Oct 2023 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/vau-r1-advancing-video-anomaly-understanding-via-reinforcement-fine-tuning/Introduces VAU-R1, a reinforcement fine-tuning framework leveraging Group Relative Policy Optimization (GRPO) to enhance multimodal large language models’ (MLLMs) reasoning capabilities in video anomaly understanding (VAU). Develops VAUBench, a comprehensive Chain-of-Thought benchmark with rich annotations across perception, grounding, reasoning, and classification tasks, supported by multiple evaluation metrics including VAU-Eval, QA accuracy, temporal IoU, and Factual Consistency. Demonstrates significant improvements over supervised fine-tuning in question answering accuracy, temporal localization, and interpretability, thereby establishing a scalable, interpretable, and reasoning-aware VAU framework.Video Anomaly Detection and Explanation via Large Language Modelshttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/video-anomaly-detection-and-explanation-via-large-language-models/Sun, 01 Oct 2023 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/video-anomaly-detection-and-explanation-via-large-language-models/The paper introduces VAD-LLaMA, a novel framework integrating video-based large language models (VLLMs) for threshold-free, explainable video anomaly detection, featuring a Long-Term Context (LTC) module and a three-phase training process that enhances long-range context modeling and minimizes data annotation costs.Video Anomaly Detection in 10 Years: A Survey and Outlookhttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/survey-2/Sun, 01 Oct 2023 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/survey-2/A comprehensive survey exploring deep learning-based video anomaly detection, including emerging paradigms such as weakly supervised, self-supervised, and unsupervised approaches, with a focus on core challenges, feature extraction, supervision schemes, loss functions, regularization techniques, and the potential of vision-language models (VLMs) for enhanced anomaly detection.VISIONGPT: LLM-ASSISTED REAL-TIME ANOMALY DETECTION FOR SAFE VISUAL NAVIGATIONhttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/visiongpt-llm-assisted-real-time-anomaly-detection-for-safe-visual-navigation/Sun, 01 Oct 2023 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/visiongpt-llm-assisted-real-time-anomaly-detection-for-safe-visual-navigation/A framework combining lightweight object detection and large language models for real-time visual navigation safety and anomaly detection, with dynamic scenario switching and prompt engineering.Hierarchical Semantic Contrast for Scene-aware Video Anomaly Detectionhttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/hierarchical-semantic-contrast-for-scene-aware-video-anomaly-detection/Sun, 01 Jan 2023 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/hierarchical-semantic-contrast-for-scene-aware-video-anomaly-detection/The paper proposes a hierarchical semantic contrast (HSC) method that leverages scene-aware autoencoders, semantic contrastive learning, and motion augmentation for improved scene-dependent and scene-independent video anomaly detection. It incorporates pre-trained video parsing models, hierarchical contrastive learning at scene and object levels, and skeleton-based motion augmentation to make the normal feature representations more compact and discriminative, thereby enhancing anomaly detection performance.TransAnomaly: Video Anomaly Detection Using Video Vision Transformerhttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/transanomaly_video_anomaly_detection_using_video_vision_transformer/Mon, 30 Aug 2021 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/transanomaly_video_anomaly_detection_using_video_vision_transformer/A prediction-based video anomaly detection approach combining U-Net and Video Vision Transformer (ViViT), with modifications for video prediction, capturing richer temporal and global context information, enabling anomaly localization.