Method on sis-arxiv-vad-papers

Method on sis-arxiv-vad-papershttps://phuchoang2603.github.io/sis-arxiv-vad-papers/type/method/Recent content in Method on sis-arxiv-vad-papersHugo -- gohugo.ioenFri, 20 Jun 2025 00:00:00 +0000Multimodal VAD: Visual Anomaly Detection in Intelligent Monitoring System via Audio-Vision-Languagehttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/multimodal_vad_visual_anomaly_detection_in_intelligent_monitoring_system_via_audio-vision-language/Fri, 20 Jun 2025 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/multimodal_vad_visual_anomaly_detection_in_intelligent_monitoring_system_via_audio-vision-language/The paper proposes a dual-stream multimodal video anomaly detection network that leverages video, audio, and text modalities to achieve reliable and precise anomaly detection. It introduces effective multimodal fusion, abnormal-aware context prompts (ACPs), and a coarse-support-fine strategy to enhance anomaly discrimination and description, demonstrating superior performance on large-scale datasets.PLOVAD: Prompting Vision-Language Models for Open Vocabulary Video Anomaly Detectionhttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/plovad_prompting_vision-language_models_for_open_vocabulary_video_anomaly_detection/Fri, 10 Jan 2025 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/plovad_prompting_vision-language_models_for_open_vocabulary_video_anomaly_detection/A novel framework (PLOVAD) leveraging prompt tuning on large-scale pretrained image-based vision-language models for open vocabulary video anomaly detection, incorporating domain-specific and anomaly-specific prompts, and a temporal module to detect and categorize both seen and unseen anomalies with limited parameters.Ex-VAD: Explainable Fine-grained Video Anomaly Detection Based on Visual-Language Modelshttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/3552_ex_vad_explainable_fine_g/Wed, 01 Jan 2025 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/3552_ex_vad_explainable_fine_g/The paper introduces Ex-VAD, a comprehensive framework for fine-grained and explainable video anomaly detection that leverages visual-language models (VLMs) and large language models (LLMs). It features modules for generating anomaly explanations, fusing multimodal features for coarse detection, and expanding/aligning labels for fine-grained classification, with improved interpretability and accuracy demonstrated on UCF-Crime and XD-Violence datasets.CALLM: Cascading Autoencoder and Large Language Model for Video Anomaly Detectionhttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/callm_cascading_autoencoder_and_large_language_model_for_video_anomaly_detection/Mon, 01 Jan 2024 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/callm_cascading_autoencoder_and_large_language_model_for_video_anomaly_detection/This paper introduces a novel cascade system combining a 3D Autoencoder with a Large Visual Language Model (LVLM) for video anomaly detection, leveraging weak supervision and multimodal capabilities to improve detection and explanation of abnormalities.VadCLIP: Adapting Vision-Language Models for Weakly Supervised Video Anomaly Detectionhttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/vadclip-adapting-vision-language-models-for-weakly-supervised-video-anomaly-detection/Mon, 01 Jan 2024 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/vadclip-adapting-vision-language-models-for-weakly-supervised-video-anomaly-detection/A novel paradigm for weakly supervised video anomaly detection leveraging frozen CLIP model with dual-branch architecture, temporal modeling modules, and prompt mechanisms to utilize vision-language knowledge for both coarse- and fine-grained detection tasks, achieving state-of-the-art performance on benchmarks.VLAVAD: Vision-Language Models Assisted Unsupervised Video Anomaly Detectionhttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/vlavad-vision-language-models-assisted-unsupervised-vad/Mon, 01 Jan 2024 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/vlavad-vision-language-models-assisted-unsupervised-vad/Proposes VLAVAD, an unsupervised video anomaly detection method leveraging vision-language pre-trained models, utilizing semantic features, Selective Prompt Adapter, and Sequence State Space Module to improve interpretability and transferability, achieving state-of-the-art performance on the ShanghaiTech dataset.AnyAnomaly: Zero-Shot Customizable Video Anomaly Detection with LVLMhttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/anyanomaly-zero-shot-customizable-video-anomaly-detection-with-lvlm/Fri, 01 Dec 2023 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/anyanomaly-zero-shot-customizable-video-anomaly-detection-with-lvlm/Proposes the AnyAnomaly model utilizing large vision language models (LVLMs) for zero-shot, customizable video anomaly detection that detects user-defined anomalies without additional training, incorporating segment-level processing and context-aware visual question answering (VQA). The approach enhances generalization across diverse environments and achieves state-of-the-art results on benchmark datasets, demonstrating practical potential for real-world applications.A VLM-based Method for Visual Anomaly Detection in Robotic Scientific Laboratorieshttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/a-vlm-based-method-for-visual-anomaly-detection-in-robotic-scientific-laboratories/Sun, 01 Oct 2023 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/a-vlm-based-method-for-visual-anomaly-detection-in-robotic-scientific-laboratories/Proposes a vision-language reasoning approach utilizing hierarchical prompts and Chain-of-Thought inference for process anomaly detection in scientific experiments. Constructs a benchmark based on real chemical laboratory workflows and demonstrates improved accuracy with prompt granularity, validated through real-world robotic lab testing.Action Hints: Semantic Typicality and Context Uniqueness for Generalizable Skeleton-based Video Anomaly Detectionhttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/action-hints-semantic-typicality-and-context-uniqueness-for/Sun, 01 Oct 2023 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/action-hints-semantic-typicality-and-context-uniqueness-for/Proposes a zero-shot skeleton-based video anomaly detection framework leveraging action semantic typicality and context uniqueness learning, utilizing language-guided semantic modeling and test-time scene-adaptive boundaries to improve generalization without target domain training data.Action Hints: Semantic Typicality and Context Uniqueness for Generalizable Skeleton-based Video Anomaly Detectionhttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/typicality-and-context-uniqueness-for/Sun, 01 Oct 2023 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/typicality-and-context-uniqueness-for/Proposes a zero-shot skeleton-based video anomaly detection framework utilizing action semantic typicality and context uniqueness learning, involving a language-guided typicality modeling module and a test-time context uniqueness analysis module, achieving state-of-the-art results without target domain training data.An Attribute-based Method for Video Anomaly Detectionhttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/an-attribute-based-method-for-video-anomaly-detection/Sun, 01 Oct 2023 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/an-attribute-based-method-for-video-anomaly-detection/A simple attribute-based approach that represents each object by velocity and pose attributes, combining these with deep representations, and uses density estimation for anomaly scoring, achieving state-of-the-art performance.Anomize: Better Open Vocabulary Video Anomaly Detectionhttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/li_anomize_better_open_vocabulary_video_anomaly_detection_cvpr_2025_paper/Sun, 01 Oct 2023 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/li_anomize_better_open_vocabulary_video_anomaly_detection_cvpr_2025_paper/The paper introduces the Anomize framework that addresses detection ambiguity and categorization confusion in open vocabulary video anomaly detection (OVVAD) by leveraging visual and textual data augmentation, dual-stream mechanisms, and label relation guidance, achieving superior performance on multiple datasets.AVadCLIP: Audio-Visual Collaboration for Robust Video Anomaly Detectionhttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/avadclip-audio-visual-collaboration-for-robust-video-anomaly-detection/Sun, 01 Oct 2023 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/avadclip-audio-visual-collaboration-for-robust-video-anomaly-detection/A novel weakly supervised framework leveraging audio-visual collaboration to improve the robustness and accuracy of video anomaly detection.CLIP: Assisted Video Anomaly Detectionhttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/clip-assisted/Sun, 01 Oct 2023 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/clip-assisted/Proposes a generalized framework for video anomaly detection based on CLIP, introducing generative anomaly descriptions, temporal modules for capturing temporal correlations, and object-centric approaches to improve performance and robustness, with extensive experimentation on UCF-Crime and ShanghaiTech datasets.Cross-Domain Learning for Video Anomaly Detection with Limited Supervisionhttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/cross-domain-learning-for-vad-with-limited-supervision/Sun, 01 Oct 2023 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/cross-domain-learning-for-vad-with-limited-supervision/A proposed weakly-supervised framework that incorporates external unlabeled data during training by estimating prediction bias and adaptively minimizing it using predicted uncertainty, to enhance cross-domain generalization in video anomaly detection.Follow the Rules: Reasoning for Video Anomaly Detection with Large Language Modelshttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/follow-the-rules-reasonin-for-vad-with-llm/Sun, 01 Oct 2023 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/follow-the-rules-reasonin-for-vad-with-llm/Proposes a rule-based reasoning framework, AnomalyRuler, for video anomaly detection using large language models, enabling fast scenario adaptation with few-normal-shot prompting and enhanced robustness through strategic modules.Holmes-VAD: Towards Unbiased and Explainable Video Anomaly Detection via Multi-modal LLMhttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/holmes-vad-towards-unbiased-and-explainable-video-anomaly-detection-via-multi-modal-llm/Sun, 01 Oct 2023 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/holmes-vad-towards-unbiased-and-explainable-video-anomaly-detection-via-multi-modal-llm/A novel framework leveraging multimodal instructions and large-scale datasets to enable unbiased, interpretable, and accurate video anomaly detection with large language models, including a new dataset VAD-Instruct50k with single-frame annotations and explanatory instruction data.SUVAD: Semantic Understanding Based Video Anomaly Detection Using MLLMhttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/suvad_semantic_understanding_based_video_anomaly_detection_using_mllm/Sun, 01 Oct 2023 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/suvad_semantic_understanding_based_video_anomaly_detection_using_mllm/Proposes a training-free video anomaly detection method leveraging multi-modal large language models for semantic understanding of videos, enabling scene generalization, interpretability, and flexible anomaly definition without retraining.Text Prompt with Normality Guidance for Weakly Supervised Video Anomaly Detectionhttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/text-prompt-with-normality-guidance-for-weakly-supervised-video-anomaly-detection/Sun, 01 Oct 2023 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/text-prompt-with-normality-guidance-for-weakly-supervised-video-anomaly-detection/Proposes a novel pseudo-label generation and self-training framework incorporating CLIP for text-image alignment, learnable text prompts, normality visual prompts, a pseudo-label generation module guided by normality clues, and a self-adaptive temporal dependence learning module, achieving state-of-the-art performance on benchmark datasets.Vad-R1: Towards Video Anomaly Reasoning via Perception-to-Cognition Chain-of-Thoughthttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/vad-r1-towards-video-anomaly-reasoning-via/Sun, 01 Oct 2023 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/vad-r1-towards-video-anomaly-reasoning-via/Proposes a structured Perception-to-Cognition Chain-of-Thought and introduces Vad-Reasoning dataset, along with an improved reinforcement learning algorithm AVA-GRPO, to enhance the deep reasoning capabilities of Multimodal Large Language Models in video anomaly detection and understanding tasks.VADSK: VIDEO ANOMALY DETECTION WITH STRUCTURED KEYWORDShttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/vadsk-video-anomaly-detection-with-structured/Sun, 01 Oct 2023 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/vadsk-video-anomaly-detection-with-structured/A lightweight, interpretable, two-stage video anomaly detection pipeline employing foundational models for frame description generation and keyword-based classification, achieving comparable performance to state-of-the-art methods with real-time inference and enhanced interpretability.VAU-R1: Advancing Video Anomaly Understanding via Reinforcement Fine-Tuninghttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/vau-r1-advancing-video-anomaly-understanding-via-reinforcement-fine-tuning/Sun, 01 Oct 2023 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/vau-r1-advancing-video-anomaly-understanding-via-reinforcement-fine-tuning/Introduces VAU-R1, a reinforcement fine-tuning framework leveraging Group Relative Policy Optimization (GRPO) to enhance multimodal large language models’ (MLLMs) reasoning capabilities in video anomaly understanding (VAU). Develops VAUBench, a comprehensive Chain-of-Thought benchmark with rich annotations across perception, grounding, reasoning, and classification tasks, supported by multiple evaluation metrics including VAU-Eval, QA accuracy, temporal IoU, and Factual Consistency. Demonstrates significant improvements over supervised fine-tuning in question answering accuracy, temporal localization, and interpretability, thereby establishing a scalable, interpretable, and reasoning-aware VAU framework.VERA: Explainable Video Anomaly Detection via Verbalized Learning of Vision-Language Modelshttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/ye_vera_explainable_video_anomaly_detection_via_verbalized_learning_of_vision-language_cvpr_2025_paper/Sun, 01 Oct 2023 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/ye_vera_explainable_video_anomaly_detection_via_verbalized_learning_of_vision-language_cvpr_2025_paper/Introduces VERA, a framework that enables frozen vision-language models to perform explainable video anomaly detection by learning detailed anomaly-characterization questions from coarsely labeled data, without model parameter modifications. The method decomposes complex reasoning into reflections on guiding questions, optimizes them via verbal interactions, and guides VLMs to generate segment- and frame-level anomaly scores with improved explainability and performance.Advanced Video Anomaly Detection Using Deep Learninghttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/vane-bench-video-anomaly-evaluation/Sat, 15 Jul 2023 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/vane-bench-video-anomaly-evaluation/This paper introduces a novel deep learning framework for detecting anomalies in video content by leveraging semi-supervised approaches that require minimal labeled data, enhancing robustness and efficiency.SlowFastVAD: Video Anomaly Detection via Integrating Simple Detector and RAG-Enhanced Vision-Language Modelhttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/slowfastvad-video-anomaly-detection-via-integrating-simpledetector-and-rag-enhanced-vision-language-model/Mon, 01 May 2023 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/slowfastvad-video-anomaly-detection-via-integrating-simpledetector-and-rag-enhanced-vision-language-model/Proposes a hybrid framework that integrates a fast anomaly detector with a slow, RAG-enhanced vision-language model to improve efficiency and interpretability in video anomaly detection. It employs a retrieval-augmented reasoning module for better scene-specific adaptation, uses an entropy-based intervention strategy to select ambiguous segments for slow detector analysis, and fuses outputs for robust detection.Generating Anomalies for Video Anomaly Detection with Prompt-based Feature Mappinghttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/liu_generating_anomalies_for_video_anomaly_detection_with_prompt-based_feature_mapping_cvpr_2023_paper/Sun, 01 Jan 2023 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/liu_generating_anomalies_for_video_anomaly_detection_with_prompt-based_feature_mapping_cvpr_2023_paper/The paper proposes a prompt-based feature mapping framework (PFMF) to generate unseen anomalies with unbounded types and narrow the scene gap for video anomaly detection, outperforming state-of-the-art methods on multiple datasets.TEVAD: Improved video anomaly detection with captionshttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/chen_tevad_improved_video_anomaly_detection_with_captions_cvprw_2023_paper/Sun, 01 Jan 2023 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/chen_tevad_improved_video_anomaly_detection_with_captions_cvprw_2023_paper/Proposes a framework that utilizes both visual and text features, generated through dense video captions, to enhance anomaly detection performance and explainability in videos.TransAnomaly: Video Anomaly Detection Using Video Vision Transformerhttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/transanomaly_video_anomaly_detection_using_video_vision_transformer/Mon, 30 Aug 2021 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/transanomaly_video_anomaly_detection_using_video_vision_transformer/A prediction-based video anomaly detection approach combining U-Net and Video Vision Transformer (ViViT), with modifications for video prediction, capturing richer temporal and global context information, enabling anomaly localization.