Xd-Violence on sis-arxiv-vad-papers

Xd-Violence on sis-arxiv-vad-papershttps://phuchoang2603.github.io/sis-arxiv-vad-papers/benchmarks/xd-violence/Recent content in Xd-Violence on sis-arxiv-vad-papersHugo -- gohugo.ioenFri, 20 Jun 2025 00:00:00 +0000Multimodal VAD: Visual Anomaly Detection in Intelligent Monitoring System via Audio-Vision-Languagehttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/multimodal_vad_visual_anomaly_detection_in_intelligent_monitoring_system_via_audio-vision-language/Fri, 20 Jun 2025 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/multimodal_vad_visual_anomaly_detection_in_intelligent_monitoring_system_via_audio-vision-language/The paper proposes a dual-stream multimodal video anomaly detection network that leverages video, audio, and text modalities to achieve reliable and precise anomaly detection. It introduces effective multimodal fusion, abnormal-aware context prompts (ACPs), and a coarse-support-fine strategy to enhance anomaly discrimination and description, demonstrating superior performance on large-scale datasets.Networking Systems for Video Anomaly Detection: A Tutorial and Surveyhttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/survey-4/Tue, 01 Apr 2025 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/survey-4/A comprehensive survey and tutorial exploring the assumptions, frameworks, recent advances, applications, and future trends of Networking Systems for Video Anomaly Detection (NSVAD), emphasizing the integration of AI, IoVT, and computing for real-world deployable systems.AADC-Net: A Multimodal Deep Learning Framework for Automatic Anomaly Detection in Real-Time Surveillancehttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/aadc-net_a_multimodal_deep_learning_framework_for_automatic_anomaly_detection_in_real-time_surveillance/Mon, 31 Mar 2025 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/aadc-net_a_multimodal_deep_learning_framework_for_automatic_anomaly_detection_in_real-time_surveillance/Introduces AADC-Net, a multimodal deep neural network leveraging pretrained vision-language models, large language models, and object detection (DETR) for real-time anomaly detection and categorization in surveillance videos. The framework addresses data scarcity, imbalance, and computational challenges, demonstrating state-of-the-art performance on multiple datasets, with practical deployment in smart gyms and healthcare settings.Personalizing Vision-Language Models With Hybrid Prompts for Zero-Shot Anomaly Detectionhttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/personalizing_vision-language_models_with_hybrid_prompts_for_zero-shot_anomaly_detection/Thu, 13 Feb 2025 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/personalizing_vision-language_models_with_hybrid_prompts_for_zero-shot_anomaly_detection/Introduces AnomalyVLM, a framework leveraging hybrid prompts derived from prior knowledge to enhance zero-shot anomaly detection by personalizing vision-language models, incorporating an anomaly region generator and refiner, and utilizing hybrid prompts for category-specific customization and improved detection performance.PLOVAD: Prompting Vision-Language Models for Open Vocabulary Video Anomaly Detectionhttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/plovad_prompting_vision-language_models_for_open_vocabulary_video_anomaly_detection/Fri, 10 Jan 2025 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/plovad_prompting_vision-language_models_for_open_vocabulary_video_anomaly_detection/A novel framework (PLOVAD) leveraging prompt tuning on large-scale pretrained image-based vision-language models for open vocabulary video anomaly detection, incorporating domain-specific and anomaly-specific prompts, and a temporal module to detect and categorize both seen and unseen anomalies with limited parameters.Ex-VAD: Explainable Fine-grained Video Anomaly Detection Based on Visual-Language Modelshttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/3552_ex_vad_explainable_fine_g/Wed, 01 Jan 2025 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/3552_ex_vad_explainable_fine_g/The paper introduces Ex-VAD, a comprehensive framework for fine-grained and explainable video anomaly detection that leverages visual-language models (VLMs) and large language models (LLMs). It features modules for generating anomaly explanations, fusing multimodal features for coarse detection, and expanding/aligning labels for fine-grained classification, with improved interpretability and accuracy demonstrated on UCF-Crime and XD-Violence datasets.Text-Driven Traffic Anomaly Detection With Temporal High-Frequency Modeling in Driving Videoshttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/text-driven_traffic_anomaly_detection_with_temporal_high-frequency_modeling_in_driving_videos/Wed, 17 Apr 2024 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/text-driven_traffic_anomaly_detection_with_temporal_high-frequency_modeling_in_driving_videos/The paper introduces TTHF, a novel single-stage method aligning video clips with text prompts for traffic anomaly detection. It emphasizes modeling high frequency in the temporal domain to capture dynamic changes in driving scenes, and proposes an attentive anomaly focusing mechanism to enhance detection of various traffic anomalies. The approach leverages visual-text semantic alignment, modeling temporal high frequency, and guided attention mechanisms, achieving superior performance on benchmark datasets.VadCLIP: Adapting Vision-Language Models for Weakly Supervised Video Anomaly Detectionhttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/vadclip-adapting-vision-language-models-for-weakly-supervised-video-anomaly-detection/Mon, 01 Jan 2024 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/vadclip-adapting-vision-language-models-for-weakly-supervised-video-anomaly-detection/A novel paradigm for weakly supervised video anomaly detection leveraging frozen CLIP model with dual-branch architecture, temporal modeling modules, and prompt mechanisms to utilize vision-language knowledge for both coarse- and fine-grained detection tasks, achieving state-of-the-art performance on benchmarks.A Survey on Video Anomaly Detection via Deep Learning: Human, Vehicle, and Environmenthttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/survey-3/Sun, 01 Oct 2023 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/survey-3/This survey provides a comprehensive overview of deep learning-based Video Anomaly Detection (VAD), covering challenges, methodologies, domain-specific applications, and future research directions across human-centric, vehicle-centric, and environment-centric contexts. It introduces a taxonomy of supervision levels, adaptive learning strategies, and explores diverse application areas including healthcare, public safety, road surveillance, and disaster detection, emphasizing the latest advancements and open challenges.Aligning Effective Tokens with Video Anomaly in Large Language Modelshttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/aligning-effective-tokens-with-video-anomaly-in-large-language-models/Sun, 01 Oct 2023 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/aligning-effective-tokens-with-video-anomaly-in-large-language-models/Proposes VA-GPT, a multimodal Large Language Model for video anomaly detection and understanding, utilizing effective token selection and generation modules (SETS and TETG) to improve spatial and temporal localization of anomalies. Introduces instruct-following fine-tuning data and cross-domain benchmarks for robustness evaluation.Anomize: Better Open Vocabulary Video Anomaly Detectionhttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/li_anomize_better_open_vocabulary_video_anomaly_detection_cvpr_2025_paper/Sun, 01 Oct 2023 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/li_anomize_better_open_vocabulary_video_anomaly_detection_cvpr_2025_paper/The paper introduces the Anomize framework that addresses detection ambiguity and categorization confusion in open vocabulary video anomaly detection (OVVAD) by leveraging visual and textual data augmentation, dual-stream mechanisms, and label relation guidance, achieving superior performance on multiple datasets.AVadCLIP: Audio-Visual Collaboration for Robust Video Anomaly Detectionhttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/avadclip-audio-visual-collaboration-for-robust-video-anomaly-detection/Sun, 01 Oct 2023 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/avadclip-audio-visual-collaboration-for-robust-video-anomaly-detection/A novel weakly supervised framework leveraging audio-visual collaboration to improve the robustness and accuracy of video anomaly detection.Cross-Domain Learning for Video Anomaly Detection with Limited Supervisionhttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/cross-domain-learning-for-vad-with-limited-supervision/Sun, 01 Oct 2023 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/cross-domain-learning-for-vad-with-limited-supervision/A proposed weakly-supervised framework that incorporates external unlabeled data during training by estimating prediction bias and adaptively minimizing it using predicted uncertainty, to enhance cross-domain generalization in video anomaly detection.Harnessing Large Language Models for Training-free Video Anomaly Detectionhttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/zanella_harnessing_large_language_models_for_training-free_video_anomaly_detection_cvpr_2024_paper/Sun, 01 Oct 2023 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/zanella_harnessing_large_language_models_for_training-free_video_anomaly_detection_cvpr_2024_paper/Introduces a training-free method for video anomaly detection (VAD) leveraging pre-trained large language models (LLMs) and vision-language models (VLMs). Proposes techniques for caption cleaning, scene description, and anomaly scoring without additional training, demonstrating superior performance on surveillance datasets.Holmes-VAD: Towards Unbiased and Explainable Video Anomaly Detection via Multi-modal LLMhttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/holmes-vad-towards-unbiased-and-explainable-video-anomaly-detection-via-multi-modal-llm/Sun, 01 Oct 2023 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/holmes-vad-towards-unbiased-and-explainable-video-anomaly-detection-via-multi-modal-llm/A novel framework leveraging multimodal instructions and large-scale datasets to enable unbiased, interpretable, and accurate video anomaly detection with large language models, including a new dataset VAD-Instruct50k with single-frame annotations and explanatory instruction data.Language-guided Open-world Video Anomaly Detectionhttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/language-guided-open-world-vad/Sun, 01 Oct 2023 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/language-guided-open-world-vad/Proposes a novel open-world VAD paradigm guided by natural language, with a dynamic anomaly definition, regularization strategies, and a large-scale dataset (PreVAD) with multi-level annotations and descriptions. Achieves state-of-the-art zero-shot performance on seven datasets.Learning Suspected Anomalies from Event Prompts for Video Anomaly Detectionhttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/learning-suspected-anomalies-from-event-prompts/Sun, 01 Oct 2023 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/learning-suspected-anomalies-from-event-prompts/Proposes a novel framework named LAP that leverages textual event prompts and semantic similarity for weakly supervised video anomaly detection. It introduces a multi-prompt learning process, pseudo anomaly labeling, and integrates semantic features derived from a prompt dictionary to guide the detection model, resulting in improved performance across multiple datasets.Learning to Understand Open-World Video Anomalieshttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/hawk--learning-to-understand-open-world-video-anomalies/Sun, 01 Oct 2023 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/hawk--learning-to-understand-open-world-video-anomalies/Introduces HAWK, a novel framework leveraging interactive large Visual Language Models with explicit and implicit motion modality integration, auxiliary consistency loss, and detailed language annotations for diverse video anomaly scenarios. Demonstrates state-of-the-art performance in video description and question-answering tasks across multiple open-world datasets, with extensive annotated data and generation pipelines to enhance practical anomaly understanding and interaction capabilities.Open-Vocabulary Video Anomaly Detectionhttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/wu_open-vocabulary_video_anomaly_detection_cvpr_2024_paper/Sun, 01 Oct 2023 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/wu_open-vocabulary_video_anomaly_detection_cvpr_2024_paper/This paper explores open-vocabulary video anomaly detection (OVVAD) leveraging pre-trained large models to detect and categorize seen and unseen anomalies. It proposes a disentangled approach with class-agnostic detection and class-specific classification modules, enhanced by semantic knowledge injection, anomaly synthesis, and joint optimization, to achieve state-of-the-art performance.SUVAD: Semantic Understanding Based Video Anomaly Detection Using MLLMhttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/suvad_semantic_understanding_based_video_anomaly_detection_using_mllm/Sun, 01 Oct 2023 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/suvad_semantic_understanding_based_video_anomaly_detection_using_mllm/Proposes a training-free video anomaly detection method leveraging multi-modal large language models for semantic understanding of videos, enabling scene generalization, interpretability, and flexible anomaly definition without retraining.Text Prompt with Normality Guidance for Weakly Supervised Video Anomaly Detectionhttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/text-prompt-with-normality-guidance-for-weakly-supervised-video-anomaly-detection/Sun, 01 Oct 2023 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/text-prompt-with-normality-guidance-for-weakly-supervised-video-anomaly-detection/Proposes a novel pseudo-label generation and self-training framework incorporating CLIP for text-image alignment, learnable text prompts, normality visual prompts, a pseudo-label generation module guided by normality clues, and a self-adaptive temporal dependence learning module, achieving state-of-the-art performance on benchmark datasets.Vad-R1: Towards Video Anomaly Reasoning via Perception-to-Cognition Chain-of-Thoughthttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/vad-r1-towards-video-anomaly-reasoning-via/Sun, 01 Oct 2023 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/vad-r1-towards-video-anomaly-reasoning-via/Proposes a structured Perception-to-Cognition Chain-of-Thought and introduces Vad-Reasoning dataset, along with an improved reinforcement learning algorithm AVA-GRPO, to enhance the deep reasoning capabilities of Multimodal Large Language Models in video anomaly detection and understanding tasks.VERA: Explainable Video Anomaly Detection via Verbalized Learning of Vision-Language Modelshttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/ye_vera_explainable_video_anomaly_detection_via_verbalized_learning_of_vision-language_cvpr_2025_paper/Sun, 01 Oct 2023 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/ye_vera_explainable_video_anomaly_detection_via_verbalized_learning_of_vision-language_cvpr_2025_paper/Introduces VERA, a framework that enables frozen vision-language models to perform explainable video anomaly detection by learning detailed anomaly-characterization questions from coarsely labeled data, without model parameter modifications. The method decomposes complex reasoning into reflections on guiding questions, optimizes them via verbal interactions, and guides VLMs to generate segment- and frame-level anomaly scores with improved explainability and performance.Video Anomaly Detection in 10 Years: A Survey and Outlookhttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/survey-2/Sun, 01 Oct 2023 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/survey-2/A comprehensive survey exploring deep learning-based video anomaly detection, including emerging paradigms such as weakly supervised, self-supervised, and unsupervised approaches, with a focus on core challenges, feature extraction, supervision schemes, loss functions, regularization techniques, and the potential of vision-language models (VLMs) for enhanced anomaly detection.SlowFastVAD: Video Anomaly Detection via Integrating Simple Detector and RAG-Enhanced Vision-Language Modelhttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/slowfastvad-video-anomaly-detection-via-integrating-simpledetector-and-rag-enhanced-vision-language-model/Mon, 01 May 2023 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/slowfastvad-video-anomaly-detection-via-integrating-simpledetector-and-rag-enhanced-vision-language-model/Proposes a hybrid framework that integrates a fast anomaly detector with a slow, RAG-enhanced vision-language model to improve efficiency and interpretability in video anomaly detection. It employs a retrieval-augmented reasoning module for better scene-specific adaptation, uses an entropy-based intervention strategy to select ambiguous segments for slow detector analysis, and fuses outputs for robust detection.Delving into CLIP latent space for Video Anomaly Recognitionhttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/delving-into-clip-latent-space-for-video-anomaly-recognition/Sun, 01 Jan 2023 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/delving-into-clip-latent-space-for-video-anomaly-recognition/Proposes AnomalyCLIP, a novel method leveraging Large Language and Vision (LLV) models like CLIP, combined with multiple instance learning and a re-centring transformation of the CLIP feature space, to detect and classify video anomalies and recognize anomaly types. Introduces a Selector model with prompt learning and a Temporal Transformer-based model for temporal dependency modeling; demonstrates state-of-the-art performance on multiple benchmarks.Generating Anomalies for Video Anomaly Detection with Prompt-based Feature Mappinghttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/liu_generating_anomalies_for_video_anomaly_detection_with_prompt-based_feature_mapping_cvpr_2023_paper/Sun, 01 Jan 2023 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/liu_generating_anomalies_for_video_anomaly_detection_with_prompt-based_feature_mapping_cvpr_2023_paper/The paper proposes a prompt-based feature mapping framework (PFMF) to generate unseen anomalies with unbounded types and narrow the scene gap for video anomaly detection, outperforming state-of-the-art methods on multiple datasets.TEVAD: Improved video anomaly detection with captionshttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/chen_tevad_improved_video_anomaly_detection_with_captions_cvprw_2023_paper/Sun, 01 Jan 2023 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/chen_tevad_improved_video_anomaly_detection_with_captions_cvprw_2023_paper/Proposes a framework that utilizes both visual and text features, generated through dense video captions, to enhance anomaly detection performance and explainability in videos.