<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hybrid on sis-arxiv-vad-papers</title><link>https://phuchoang2603.github.io/sis-arxiv-vad-papers/categories/hybrid/</link><description>Recent content in Hybrid on sis-arxiv-vad-papers</description><generator>Hugo -- gohugo.io</generator><language>en</language><lastBuildDate>Fri, 20 Jun 2025 00:00:00 +0000</lastBuildDate><atom:link href="https://phuchoang2603.github.io/sis-arxiv-vad-papers/categories/hybrid/index.xml" rel="self" type="application/rss+xml"/><item><title>Multimodal VAD: Visual Anomaly Detection in Intelligent Monitoring System via Audio-Vision-Language</title><link>https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/multimodal_vad_visual_anomaly_detection_in_intelligent_monitoring_system_via_audio-vision-language/</link><pubDate>Fri, 20 Jun 2025 00:00:00 +0000</pubDate><guid>https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/multimodal_vad_visual_anomaly_detection_in_intelligent_monitoring_system_via_audio-vision-language/</guid><description>The paper proposes a dual-stream multimodal video anomaly detection network that leverages video, audio, and text modalities to achieve reliable and precise anomaly detection. It introduces effective multimodal fusion, abnormal-aware context prompts (ACPs), and a coarse-support-fine strategy to enhance anomaly discrimination and description, demonstrating superior performance on large-scale datasets.</description></item><item><title>Networking Systems for Video Anomaly Detection: A Tutorial and Survey</title><link>https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/survey-4/</link><pubDate>Tue, 01 Apr 2025 00:00:00 +0000</pubDate><guid>https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/survey-4/</guid><description>A comprehensive survey and tutorial exploring the assumptions, frameworks, recent advances, applications, and future trends of Networking Systems for Video Anomaly Detection (NSVAD), emphasizing the integration of AI, IoVT, and computing for real-world deployable systems.</description></item><item><title>AADC-Net: A Multimodal Deep Learning Framework for Automatic Anomaly Detection in Real-Time Surveillance</title><link>https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/aadc-net_a_multimodal_deep_learning_framework_for_automatic_anomaly_detection_in_real-time_surveillance/</link><pubDate>Mon, 31 Mar 2025 00:00:00 +0000</pubDate><guid>https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/aadc-net_a_multimodal_deep_learning_framework_for_automatic_anomaly_detection_in_real-time_surveillance/</guid><description>Introduces AADC-Net, a multimodal deep neural network leveraging pretrained vision-language models, large language models, and object detection (DETR) for real-time anomaly detection and categorization in surveillance videos. The framework addresses data scarcity, imbalance, and computational challenges, demonstrating state-of-the-art performance on multiple datasets, with practical deployment in smart gyms and healthcare settings.</description></item><item><title>Personalizing Vision-Language Models With Hybrid Prompts for Zero-Shot Anomaly Detection</title><link>https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/personalizing_vision-language_models_with_hybrid_prompts_for_zero-shot_anomaly_detection/</link><pubDate>Thu, 13 Feb 2025 00:00:00 +0000</pubDate><guid>https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/personalizing_vision-language_models_with_hybrid_prompts_for_zero-shot_anomaly_detection/</guid><description>Introduces AnomalyVLM, a framework leveraging hybrid prompts derived from prior knowledge to enhance zero-shot anomaly detection by personalizing vision-language models, incorporating an anomaly region generator and refiner, and utilizing hybrid prompts for category-specific customization and improved detection performance.</description></item><item><title>PLOVAD: Prompting Vision-Language Models for Open Vocabulary Video Anomaly Detection</title><link>https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/plovad_prompting_vision-language_models_for_open_vocabulary_video_anomaly_detection/</link><pubDate>Fri, 10 Jan 2025 00:00:00 +0000</pubDate><guid>https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/plovad_prompting_vision-language_models_for_open_vocabulary_video_anomaly_detection/</guid><description>A novel framework (PLOVAD) leveraging prompt tuning on large-scale pretrained image-based vision-language models for open vocabulary video anomaly detection, incorporating domain-specific and anomaly-specific prompts, and a temporal module to detect and categorize both seen and unseen anomalies with limited parameters.</description></item><item><title>Ex-VAD: Explainable Fine-grained Video Anomaly Detection Based on Visual-Language Models</title><link>https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/3552_ex_vad_explainable_fine_g/</link><pubDate>Wed, 01 Jan 2025 00:00:00 +0000</pubDate><guid>https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/3552_ex_vad_explainable_fine_g/</guid><description>The paper introduces Ex-VAD, a comprehensive framework for fine-grained and explainable video anomaly detection that leverages visual-language models (VLMs) and large language models (LLMs). It features modules for generating anomaly explanations, fusing multimodal features for coarse detection, and expanding/aligning labels for fine-grained classification, with improved interpretability and accuracy demonstrated on UCF-Crime and XD-Violence datasets.</description></item><item><title>Text-Driven Traffic Anomaly Detection With Temporal High-Frequency Modeling in Driving Videos</title><link>https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/text-driven_traffic_anomaly_detection_with_temporal_high-frequency_modeling_in_driving_videos/</link><pubDate>Wed, 17 Apr 2024 00:00:00 +0000</pubDate><guid>https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/text-driven_traffic_anomaly_detection_with_temporal_high-frequency_modeling_in_driving_videos/</guid><description>The paper introduces TTHF, a novel single-stage method aligning video clips with text prompts for traffic anomaly detection. It emphasizes modeling high frequency in the temporal domain to capture dynamic changes in driving scenes, and proposes an attentive anomaly focusing mechanism to enhance detection of various traffic anomalies. The approach leverages visual-text semantic alignment, modeling temporal high frequency, and guided attention mechanisms, achieving superior performance on benchmark datasets.</description></item><item><title>VadCLIP: Adapting Vision-Language Models for Weakly Supervised Video Anomaly Detection</title><link>https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/vadclip-adapting-vision-language-models-for-weakly-supervised-video-anomaly-detection/</link><pubDate>Mon, 01 Jan 2024 00:00:00 +0000</pubDate><guid>https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/vadclip-adapting-vision-language-models-for-weakly-supervised-video-anomaly-detection/</guid><description>A novel paradigm for weakly supervised video anomaly detection leveraging frozen CLIP model with dual-branch architecture, temporal modeling modules, and prompt mechanisms to utilize vision-language knowledge for both coarse- and fine-grained detection tasks, achieving state-of-the-art performance on benchmarks.</description></item><item><title>VLAVAD: Vision-Language Models Assisted Unsupervised Video Anomaly Detection</title><link>https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/vlavad-vision-language-models-assisted-unsupervised-vad/</link><pubDate>Mon, 01 Jan 2024 00:00:00 +0000</pubDate><guid>https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/vlavad-vision-language-models-assisted-unsupervised-vad/</guid><description>Proposes VLAVAD, an unsupervised video anomaly detection method leveraging vision-language pre-trained models, utilizing semantic features, Selective Prompt Adapter, and Sequence State Space Module to improve interpretability and transferability, achieving state-of-the-art performance on the ShanghaiTech dataset.</description></item><item><title>AnyAnomaly: Zero-Shot Customizable Video Anomaly Detection with LVLM</title><link>https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/anyanomaly-zero-shot-customizable-video-anomaly-detection-with-lvlm/</link><pubDate>Fri, 01 Dec 2023 00:00:00 +0000</pubDate><guid>https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/anyanomaly-zero-shot-customizable-video-anomaly-detection-with-lvlm/</guid><description>Proposes the AnyAnomaly model utilizing large vision language models (LVLMs) for zero-shot, customizable video anomaly detection that detects user-defined anomalies without additional training, incorporating segment-level processing and context-aware visual question answering (VQA). The approach enhances generalization across diverse environments and achieves state-of-the-art results on benchmark datasets, demonstrating practical potential for real-world applications.</description></item><item><title>Towards Generic Anomaly Detection and Understanding: Large-scale Visual-linguistic Model (GPT-4V) Takes the Lead</title><link>https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/towards-generic-anomaly-detection-and-understanding/</link><pubDate>Tue, 31 Oct 2023 00:00:00 +0000</pubDate><guid>https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/towards-generic-anomaly-detection-and-understanding/</guid><description>This study explores the use of GPT-4V, a large visual-linguistic model, for generic anomaly detection across multiple modalities and domains, demonstrating its ability to understand global and fine-grained semantics, reason automatically, and improve with prompts. It evaluates GPT-4V on diverse tasks including industrial, medical, logical, video, 3D, and time series anomaly detection, discussing its promising performance and future directions for enhancement, such as quantitative metrics, expanded benchmarks, multi-round interactions, human feedback, and real-time application.</description></item><item><title>SmartHome-Bench: A Comprehensive Benchmark for Video Anomaly Detection in Smart Homes Using Multi-Modal Large Language Models</title><link>https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/smarthome-bench-a-comprehensive-benchmark-for-video-anomaly-detection-in-smart-homes-using-multi-modal-large-language-models/</link><pubDate>Thu, 05 Oct 2023 00:00:00 +0000</pubDate><guid>https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/smarthome-bench-a-comprehensive-benchmark-for-video-anomaly-detection-in-smart-homes-using-multi-modal-large-language-models/</guid><description>The paper introduces SmartHome-Bench, the first large-scale dataset and benchmark designed specifically for video anomaly detection (VAD) within smart home environments, incorporating 1,203 annotated videos across seven categories such as Wildlife, Senior Care, and Baby Monitoring. The dataset includes detailed annotations with anomaly tags, descriptions, and rationales, facilitating research on multi-modal large language models (MLLMs) for explainable VAD. It evaluates various adaptation methods, including prompting strategies and a novel taxonomy-driven reflective LLM chain (TRLC), demonstrating significant performance improvements and highlighting current model limitations. The study aims to advance smart home security by providing a dedicated benchmark and novel framework for enhancing MLLM-based anomaly detection and reasoning.</description></item><item><title>A Survey on Video Anomaly Detection via Deep Learning: Human, Vehicle, and Environment</title><link>https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/survey-3/</link><pubDate>Sun, 01 Oct 2023 00:00:00 +0000</pubDate><guid>https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/survey-3/</guid><description>This survey provides a comprehensive overview of deep learning-based Video Anomaly Detection (VAD), covering challenges, methodologies, domain-specific applications, and future research directions across human-centric, vehicle-centric, and environment-centric contexts. It introduces a taxonomy of supervision levels, adaptive learning strategies, and explores diverse application areas including healthcare, public safety, road surveillance, and disaster detection, emphasizing the latest advancements and open challenges.</description></item><item><title>Action Hints: Semantic Typicality and Context Uniqueness for Generalizable Skeleton-based Video Anomaly Detection</title><link>https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/action-hints-semantic-typicality-and-context-uniqueness-for/</link><pubDate>Sun, 01 Oct 2023 00:00:00 +0000</pubDate><guid>https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/action-hints-semantic-typicality-and-context-uniqueness-for/</guid><description>Proposes a zero-shot skeleton-based video anomaly detection framework leveraging action semantic typicality and context uniqueness learning, utilizing language-guided semantic modeling and test-time scene-adaptive boundaries to improve generalization without target domain training data.</description></item><item><title>Aligning Effective Tokens with Video Anomaly in Large Language Models</title><link>https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/aligning-effective-tokens-with-video-anomaly-in-large-language-models/</link><pubDate>Sun, 01 Oct 2023 00:00:00 +0000</pubDate><guid>https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/aligning-effective-tokens-with-video-anomaly-in-large-language-models/</guid><description>Proposes VA-GPT, a multimodal Large Language Model for video anomaly detection and understanding, utilizing effective token selection and generation modules (SETS and TETG) to improve spatial and temporal localization of anomalies. Introduces instruct-following fine-tuning data and cross-domain benchmarks for robustness evaluation.</description></item><item><title>Anomaly-Led Prompting Learning Caption Generating Model and Benchmark</title><link>https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/anomaly-led_prompting_learning_caption_generating_model_and_benchmark/</link><pubDate>Sun, 01 Oct 2023 00:00:00 +0000</pubDate><guid>https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/anomaly-led_prompting_learning_caption_generating_model_and_benchmark/</guid><description>Introduces a new task for comprehensive video anomaly captioning, proposes a large-scale benchmark dataset CVACBench with fine-grained annotations, and designs a baseline model AGPFormer using prompt learning to improve anomaly understanding and description accuracy.</description></item><item><title>Anomize: Better Open Vocabulary Video Anomaly Detection</title><link>https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/li_anomize_better_open_vocabulary_video_anomaly_detection_cvpr_2025_paper/</link><pubDate>Sun, 01 Oct 2023 00:00:00 +0000</pubDate><guid>https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/li_anomize_better_open_vocabulary_video_anomaly_detection_cvpr_2025_paper/</guid><description>The paper introduces the Anomize framework that addresses detection ambiguity and categorization confusion in open vocabulary video anomaly detection (OVVAD) by leveraging visual and textual data augmentation, dual-stream mechanisms, and label relation guidance, achieving superior performance on multiple datasets.</description></item><item><title>AssistPDA: An Online Video Surveillance Assistant for Video Anomaly Prediction, Detection, and Analysis</title><link>https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/assistpda-an-online-video-surveillance-assistant-for-video-anomaly-prediction-detection-and-analysis/</link><pubDate>Sun, 01 Oct 2023 00:00:00 +0000</pubDate><guid>https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/assistpda-an-online-video-surveillance-assistant-for-video-anomaly-prediction-detection-and-analysis/</guid><description>Introducing AssistPDA, a pioneering framework for real-time online video anomaly prediction, detection, and analysis leveraging vision-language models with a novel spatiotemporal relation distillation module and constructed benchmark dataset VAPDA-127K.</description></item><item><title>AVadCLIP: Audio-Visual Collaboration for Robust Video Anomaly Detection</title><link>https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/avadclip-audio-visual-collaboration-for-robust-video-anomaly-detection/</link><pubDate>Sun, 01 Oct 2023 00:00:00 +0000</pubDate><guid>https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/avadclip-audio-visual-collaboration-for-robust-video-anomaly-detection/</guid><description>A novel weakly supervised framework leveraging audio-visual collaboration to improve the robustness and accuracy of video anomaly detection.</description></item><item><title>CLIP: Assisted Video Anomaly Detection</title><link>https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/clip-assisted/</link><pubDate>Sun, 01 Oct 2023 00:00:00 +0000</pubDate><guid>https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/clip-assisted/</guid><description>Proposes a generalized framework for video anomaly detection based on CLIP, introducing generative anomaly descriptions, temporal modules for capturing temporal correlations, and object-centric approaches to improve performance and robustness, with extensive experimentation on UCF-Crime and ShanghaiTech datasets.</description></item><item><title>Cross-Domain Learning for Video Anomaly Detection with Limited Supervision</title><link>https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/cross-domain-learning-for-vad-with-limited-supervision/</link><pubDate>Sun, 01 Oct 2023 00:00:00 +0000</pubDate><guid>https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/cross-domain-learning-for-vad-with-limited-supervision/</guid><description>A proposed weakly-supervised framework that incorporates external unlabeled data during training by estimating prediction bias and adaptively minimizing it using predicted uncertainty, to enhance cross-domain generalization in video anomaly detection.</description></item><item><title>Exploring Large Vision-Language Models for Robust and Efficient Industrial Anomaly Detection</title><link>https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/exploring-large-vision-language-models-for-robust-and-efficient-industrial-anomaly-detection/</link><pubDate>Sun, 01 Oct 2023 00:00:00 +0000</pubDate><guid>https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/exploring-large-vision-language-models-for-robust-and-efficient-industrial-anomaly-detection/</guid><description>Proposes a novel approach (CLAD) leveraging large vision-language models with contrastive cross-modal training for improved industrial anomaly detection and localization, enhancing interpretability and robustness.</description></item><item><title>Follow the Rules: Reasoning for Video Anomaly Detection with Large Language Models</title><link>https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/follow-the-rules-reasonin-for-vad-with-llm/</link><pubDate>Sun, 01 Oct 2023 00:00:00 +0000</pubDate><guid>https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/follow-the-rules-reasonin-for-vad-with-llm/</guid><description>Proposes a rule-based reasoning framework, AnomalyRuler, for video anomaly detection using large language models, enabling fast scenario adaptation with few-normal-shot prompting and enhanced robustness through strategic modules.</description></item><item><title>Harnessing Large Language Models for Training-free Video Anomaly Detection</title><link>https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/zanella_harnessing_large_language_models_for_training-free_video_anomaly_detection_cvpr_2024_paper/</link><pubDate>Sun, 01 Oct 2023 00:00:00 +0000</pubDate><guid>https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/zanella_harnessing_large_language_models_for_training-free_video_anomaly_detection_cvpr_2024_paper/</guid><description>Introduces a training-free method for video anomaly detection (VAD) leveraging pre-trained large language models (LLMs) and vision-language models (VLMs). Proposes techniques for caption cleaning, scene description, and anomaly scoring without additional training, demonstrating superior performance on surveillance datasets.</description></item><item><title>Holmes-VAD: Towards Unbiased and Explainable Video Anomaly Detection via Multi-modal LLM</title><link>https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/holmes-vad-towards-unbiased-and-explainable-video-anomaly-detection-via-multi-modal-llm/</link><pubDate>Sun, 01 Oct 2023 00:00:00 +0000</pubDate><guid>https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/holmes-vad-towards-unbiased-and-explainable-video-anomaly-detection-via-multi-modal-llm/</guid><description>A novel framework leveraging multimodal instructions and large-scale datasets to enable unbiased, interpretable, and accurate video anomaly detection with large language models, including a new dataset VAD-Instruct50k with single-frame annotations and explanatory instruction data.</description></item><item><title>Holmes-VAU: Towards Long-term Video Anomaly Understanding at Any Granularity</title><link>https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/holmes-vau-towards-long-term-video-anomaly-understanding-at-any-granularity/</link><pubDate>Sun, 01 Oct 2023 00:00:00 +0000</pubDate><guid>https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/holmes-vau-towards-long-term-video-anomaly-understanding-at-any-granularity/</guid><description>A semi-automated hierarchical video annotation framework combined with a novel Anomaly-focused Temporal Sampler and a multimodal large language model, aimed at comprehensive understanding of complex and long-term video anomalies across multiple temporal scales.</description></item><item><title>Language-guided Open-world Video Anomaly Detection</title><link>https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/language-guided-open-world-vad/</link><pubDate>Sun, 01 Oct 2023 00:00:00 +0000</pubDate><guid>https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/language-guided-open-world-vad/</guid><description>Proposes a novel open-world VAD paradigm guided by natural language, with a dynamic anomaly definition, regularization strategies, and a large-scale dataset (PreVAD) with multi-level annotations and descriptions. Achieves state-of-the-art zero-shot performance on seven datasets.</description></item><item><title>Learning to Understand Open-World Video Anomalies</title><link>https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/hawk--learning-to-understand-open-world-video-anomalies/</link><pubDate>Sun, 01 Oct 2023 00:00:00 +0000</pubDate><guid>https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/hawk--learning-to-understand-open-world-video-anomalies/</guid><description>Introduces HAWK, a novel framework leveraging interactive large Visual Language Models with explicit and implicit motion modality integration, auxiliary consistency loss, and detailed language annotations for diverse video anomaly scenarios. Demonstrates state-of-the-art performance in video description and question-answering tasks across multiple open-world datasets, with extensive annotated data and generation pipelines to enhance practical anomaly understanding and interaction capabilities.</description></item><item><title>Open-Vocabulary Video Anomaly Detection</title><link>https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/wu_open-vocabulary_video_anomaly_detection_cvpr_2024_paper/</link><pubDate>Sun, 01 Oct 2023 00:00:00 +0000</pubDate><guid>https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/wu_open-vocabulary_video_anomaly_detection_cvpr_2024_paper/</guid><description>This paper explores open-vocabulary video anomaly detection (OVVAD) leveraging pre-trained large models to detect and categorize seen and unseen anomalies. It proposes a disentangled approach with class-agnostic detection and class-specific classification modules, enhanced by semantic knowledge injection, anomaly synthesis, and joint optimization, to achieve state-of-the-art performance.</description></item><item><title>Text Prompt with Normality Guidance for Weakly Supervised Video Anomaly Detection</title><link>https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/text-prompt-with-normality-guidance-for-weakly-supervised-video-anomaly-detection/</link><pubDate>Sun, 01 Oct 2023 00:00:00 +0000</pubDate><guid>https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/text-prompt-with-normality-guidance-for-weakly-supervised-video-anomaly-detection/</guid><description>Proposes a novel pseudo-label generation and self-training framework incorporating CLIP for text-image alignment, learnable text prompts, normality visual prompts, a pseudo-label generation module guided by normality clues, and a self-adaptive temporal dependence learning module, achieving state-of-the-art performance on benchmark datasets.</description></item><item><title>Towards Zero-Shot Anomaly Detection and Reasoning with Multimodal Large Language Models</title><link>https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/towards-zero-shot-anomaly-detection-and-reasoning/</link><pubDate>Sun, 01 Oct 2023 00:00:00 +0000</pubDate><guid>https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/towards-zero-shot-anomaly-detection-and-reasoning/</guid><description>Introduces a specialist visual assistant, Anomaly-OV, leveraging an anomaly expert and visual token selection mechanism to improve zero-shot anomaly detection and reasoning, establishing new datasets and benchmarks in the domain.</description></item><item><title>Unspecified</title><link>https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/survey-paper/</link><pubDate>Sun, 01 Oct 2023 00:00:00 +0000</pubDate><guid>https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/survey-paper/</guid><description>Unspecified</description></item><item><title>Vad-R1: Towards Video Anomaly Reasoning via Perception-to-Cognition Chain-of-Thought</title><link>https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/vad-r1-towards-video-anomaly-reasoning-via/</link><pubDate>Sun, 01 Oct 2023 00:00:00 +0000</pubDate><guid>https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/vad-r1-towards-video-anomaly-reasoning-via/</guid><description>Proposes a structured Perception-to-Cognition Chain-of-Thought and introduces Vad-Reasoning dataset, along with an improved reinforcement learning algorithm AVA-GRPO, to enhance the deep reasoning capabilities of Multimodal Large Language Models in video anomaly detection and understanding tasks.</description></item><item><title>VAU-R1: Advancing Video Anomaly Understanding via Reinforcement Fine-Tuning</title><link>https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/vau-r1-advancing-video-anomaly-understanding-via-reinforcement-fine-tuning/</link><pubDate>Sun, 01 Oct 2023 00:00:00 +0000</pubDate><guid>https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/vau-r1-advancing-video-anomaly-understanding-via-reinforcement-fine-tuning/</guid><description>Introduces VAU-R1, a reinforcement fine-tuning framework leveraging Group Relative Policy Optimization (GRPO) to enhance multimodal large language models&amp;rsquo; (MLLMs) reasoning capabilities in video anomaly understanding (VAU). Develops VAUBench, a comprehensive Chain-of-Thought benchmark with rich annotations across perception, grounding, reasoning, and classification tasks, supported by multiple evaluation metrics including VAU-Eval, QA accuracy, temporal IoU, and Factual Consistency. Demonstrates significant improvements over supervised fine-tuning in question answering accuracy, temporal localization, and interpretability, thereby establishing a scalable, interpretable, and reasoning-aware VAU framework.</description></item><item><title>VERA: Explainable Video Anomaly Detection via Verbalized Learning of Vision-Language Models</title><link>https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/ye_vera_explainable_video_anomaly_detection_via_verbalized_learning_of_vision-language_cvpr_2025_paper/</link><pubDate>Sun, 01 Oct 2023 00:00:00 +0000</pubDate><guid>https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/ye_vera_explainable_video_anomaly_detection_via_verbalized_learning_of_vision-language_cvpr_2025_paper/</guid><description>Introduces VERA, a framework that enables frozen vision-language models to perform explainable video anomaly detection by learning detailed anomaly-characterization questions from coarsely labeled data, without model parameter modifications. The method decomposes complex reasoning into reflections on guiding questions, optimizes them via verbal interactions, and guides VLMs to generate segment- and frame-level anomaly scores with improved explainability and performance.</description></item><item><title>Video Anomaly Detection in 10 Years: A Survey and Outlook</title><link>https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/survey-2/</link><pubDate>Sun, 01 Oct 2023 00:00:00 +0000</pubDate><guid>https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/survey-2/</guid><description>A comprehensive survey exploring deep learning-based video anomaly detection, including emerging paradigms such as weakly supervised, self-supervised, and unsupervised approaches, with a focus on core challenges, feature extraction, supervision schemes, loss functions, regularization techniques, and the potential of vision-language models (VLMs) for enhanced anomaly detection.</description></item><item><title>VISIONGPT: LLM-ASSISTED REAL-TIME ANOMALY DETECTION FOR SAFE VISUAL NAVIGATION</title><link>https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/visiongpt-llm-assisted-real-time-anomaly-detection-for-safe-visual-navigation/</link><pubDate>Sun, 01 Oct 2023 00:00:00 +0000</pubDate><guid>https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/visiongpt-llm-assisted-real-time-anomaly-detection-for-safe-visual-navigation/</guid><description>A framework combining lightweight object detection and large language models for real-time visual navigation safety and anomaly detection, with dynamic scenario switching and prompt engineering.</description></item><item><title>SlowFastVAD: Video Anomaly Detection via Integrating Simple Detector and RAG-Enhanced Vision-Language Model</title><link>https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/slowfastvad-video-anomaly-detection-via-integrating-simpledetector-and-rag-enhanced-vision-language-model/</link><pubDate>Mon, 01 May 2023 00:00:00 +0000</pubDate><guid>https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/slowfastvad-video-anomaly-detection-via-integrating-simpledetector-and-rag-enhanced-vision-language-model/</guid><description>Proposes a hybrid framework that integrates a fast anomaly detector with a slow, RAG-enhanced vision-language model to improve efficiency and interpretability in video anomaly detection. It employs a retrieval-augmented reasoning module for better scene-specific adaptation, uses an entropy-based intervention strategy to select ambiguous segments for slow detector analysis, and fuses outputs for robust detection.</description></item><item><title>Generating Anomalies for Video Anomaly Detection with Prompt-based Feature Mapping</title><link>https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/liu_generating_anomalies_for_video_anomaly_detection_with_prompt-based_feature_mapping_cvpr_2023_paper/</link><pubDate>Sun, 01 Jan 2023 00:00:00 +0000</pubDate><guid>https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/liu_generating_anomalies_for_video_anomaly_detection_with_prompt-based_feature_mapping_cvpr_2023_paper/</guid><description>The paper proposes a prompt-based feature mapping framework (PFMF) to generate unseen anomalies with unbounded types and narrow the scene gap for video anomaly detection, outperforming state-of-the-art methods on multiple datasets.</description></item><item><title>Text-Driven Traffic Anomaly Detection with Temporal High-Frequency Modeling in Driving Videos</title><link>https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/text-driven-traffic-anomaly-detection-with-temporal-high-frequency/</link><pubDate>Sun, 01 Jan 2023 00:00:00 +0000</pubDate><guid>https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/text-driven-traffic-anomaly-detection-with-temporal-high-frequency/</guid><description>Introduces a novel single-stage approach (TTHF) for traffic anomaly detection that aligns video clips with text prompts and models high-frequency temporal changes, enhanced by an attention focusing mechanism, outperforming state-of-the-art methods on benchmark datasets.</description></item><item><title>Towards Video Anomaly Retrieval from Video Anomaly Detection: New Benchmarks and Model</title><link>https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/toward-video-anomaly-retrieval-from-video/</link><pubDate>Sun, 01 Jan 2023 00:00:00 +0000</pubDate><guid>https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/toward-video-anomaly-retrieval-from-video/</guid><description>Proposes a new task called Video Anomaly Retrieval (VAR), introduces two large-scale benchmarks (UCFCrime-AR and XDViolence-AR), and presents a model called Anomaly-Led Alignment Network (ALAN) for VAR, focusing on retrieving long untrimmed videos using cross-modal queries such as language descriptions and synchronous audios. The work introduces anomaly-led sampling, a pretext task (VPMPM), and cross-modal alignment strategies to address the challenges of VAR in practical scenarios.</description></item></channel></rss>