<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Ubnormal on sis-arxiv-vad-papers</title><link>https://phuchoang2603.github.io/sis-arxiv-vad-papers/benchmarks/ubnormal/</link><description>Recent content in Ubnormal on sis-arxiv-vad-papers</description><generator>Hugo -- gohugo.io</generator><language>en</language><lastBuildDate>Tue, 01 Apr 2025 00:00:00 +0000</lastBuildDate><atom:link href="https://phuchoang2603.github.io/sis-arxiv-vad-papers/benchmarks/ubnormal/index.xml" rel="self" type="application/rss+xml"/><item><title>Networking Systems for Video Anomaly Detection: A Tutorial and Survey</title><link>https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/survey-4/</link><pubDate>Tue, 01 Apr 2025 00:00:00 +0000</pubDate><guid>https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/survey-4/</guid><description>A comprehensive survey and tutorial exploring the assumptions, frameworks, recent advances, applications, and future trends of Networking Systems for Video Anomaly Detection (NSVAD), emphasizing the integration of AI, IoVT, and computing for real-world deployable systems.</description></item><item><title>Personalizing Vision-Language Models With Hybrid Prompts for Zero-Shot Anomaly Detection</title><link>https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/personalizing_vision-language_models_with_hybrid_prompts_for_zero-shot_anomaly_detection/</link><pubDate>Thu, 13 Feb 2025 00:00:00 +0000</pubDate><guid>https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/personalizing_vision-language_models_with_hybrid_prompts_for_zero-shot_anomaly_detection/</guid><description>Introduces AnomalyVLM, a framework leveraging hybrid prompts derived from prior knowledge to enhance zero-shot anomaly detection by personalizing vision-language models, incorporating an anomaly region generator and refiner, and utilizing hybrid prompts for category-specific customization and improved detection performance.</description></item><item><title>PLOVAD: Prompting Vision-Language Models for Open Vocabulary Video Anomaly Detection</title><link>https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/plovad_prompting_vision-language_models_for_open_vocabulary_video_anomaly_detection/</link><pubDate>Fri, 10 Jan 2025 00:00:00 +0000</pubDate><guid>https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/plovad_prompting_vision-language_models_for_open_vocabulary_video_anomaly_detection/</guid><description>A novel framework (PLOVAD) leveraging prompt tuning on large-scale pretrained image-based vision-language models for open vocabulary video anomaly detection, incorporating domain-specific and anomaly-specific prompts, and a temporal module to detect and categorize both seen and unseen anomalies with limited parameters.</description></item><item><title>Text-Driven Traffic Anomaly Detection With Temporal High-Frequency Modeling in Driving Videos</title><link>https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/text-driven_traffic_anomaly_detection_with_temporal_high-frequency_modeling_in_driving_videos/</link><pubDate>Wed, 17 Apr 2024 00:00:00 +0000</pubDate><guid>https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/text-driven_traffic_anomaly_detection_with_temporal_high-frequency_modeling_in_driving_videos/</guid><description>The paper introduces TTHF, a novel single-stage method aligning video clips with text prompts for traffic anomaly detection. It emphasizes modeling high frequency in the temporal domain to capture dynamic changes in driving scenes, and proposes an attentive anomaly focusing mechanism to enhance detection of various traffic anomalies. The approach leverages visual-text semantic alignment, modeling temporal high frequency, and guided attention mechanisms, achieving superior performance on benchmark datasets.</description></item><item><title>CALLM: Cascading Autoencoder and Large Language Model for Video Anomaly Detection</title><link>https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/callm_cascading_autoencoder_and_large_language_model_for_video_anomaly_detection/</link><pubDate>Mon, 01 Jan 2024 00:00:00 +0000</pubDate><guid>https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/callm_cascading_autoencoder_and_large_language_model_for_video_anomaly_detection/</guid><description>This paper introduces a novel cascade system combining a 3D Autoencoder with a Large Visual Language Model (LVLM) for video anomaly detection, leveraging weak supervision and multimodal capabilities to improve detection and explanation of abnormalities.</description></item><item><title>AnyAnomaly: Zero-Shot Customizable Video Anomaly Detection with LVLM</title><link>https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/anyanomaly-zero-shot-customizable-video-anomaly-detection-with-lvlm/</link><pubDate>Fri, 01 Dec 2023 00:00:00 +0000</pubDate><guid>https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/anyanomaly-zero-shot-customizable-video-anomaly-detection-with-lvlm/</guid><description>Proposes the AnyAnomaly model utilizing large vision language models (LVLMs) for zero-shot, customizable video anomaly detection that detects user-defined anomalies without additional training, incorporating segment-level processing and context-aware visual question answering (VQA). The approach enhances generalization across diverse environments and achieves state-of-the-art results on benchmark datasets, demonstrating practical potential for real-world applications.</description></item><item><title>Action Hints: Semantic Typicality and Context Uniqueness for Generalizable Skeleton-based Video Anomaly Detection</title><link>https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/action-hints-semantic-typicality-and-context-uniqueness-for/</link><pubDate>Sun, 01 Oct 2023 00:00:00 +0000</pubDate><guid>https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/action-hints-semantic-typicality-and-context-uniqueness-for/</guid><description>Proposes a zero-shot skeleton-based video anomaly detection framework leveraging action semantic typicality and context uniqueness learning, utilizing language-guided semantic modeling and test-time scene-adaptive boundaries to improve generalization without target domain training data.</description></item><item><title>Action Hints: Semantic Typicality and Context Uniqueness for Generalizable Skeleton-based Video Anomaly Detection</title><link>https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/typicality-and-context-uniqueness-for/</link><pubDate>Sun, 01 Oct 2023 00:00:00 +0000</pubDate><guid>https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/typicality-and-context-uniqueness-for/</guid><description>Proposes a zero-shot skeleton-based video anomaly detection framework utilizing action semantic typicality and context uniqueness learning, involving a language-guided typicality modeling module and a test-time context uniqueness analysis module, achieving state-of-the-art results without target domain training data.</description></item><item><title>Language-guided Open-world Video Anomaly Detection</title><link>https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/language-guided-open-world-vad/</link><pubDate>Sun, 01 Oct 2023 00:00:00 +0000</pubDate><guid>https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/language-guided-open-world-vad/</guid><description>Proposes a novel open-world VAD paradigm guided by natural language, with a dynamic anomaly definition, regularization strategies, and a large-scale dataset (PreVAD) with multi-level annotations and descriptions. Achieves state-of-the-art zero-shot performance on seven datasets.</description></item><item><title>Learning to Understand Open-World Video Anomalies</title><link>https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/hawk--learning-to-understand-open-world-video-anomalies/</link><pubDate>Sun, 01 Oct 2023 00:00:00 +0000</pubDate><guid>https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/hawk--learning-to-understand-open-world-video-anomalies/</guid><description>Introduces HAWK, a novel framework leveraging interactive large Visual Language Models with explicit and implicit motion modality integration, auxiliary consistency loss, and detailed language annotations for diverse video anomaly scenarios. Demonstrates state-of-the-art performance in video description and question-answering tasks across multiple open-world datasets, with extensive annotated data and generation pipelines to enhance practical anomaly understanding and interaction capabilities.</description></item><item><title>Open-Vocabulary Video Anomaly Detection</title><link>https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/wu_open-vocabulary_video_anomaly_detection_cvpr_2024_paper/</link><pubDate>Sun, 01 Oct 2023 00:00:00 +0000</pubDate><guid>https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/wu_open-vocabulary_video_anomaly_detection_cvpr_2024_paper/</guid><description>This paper explores open-vocabulary video anomaly detection (OVVAD) leveraging pre-trained large models to detect and categorize seen and unseen anomalies. It proposes a disentangled approach with class-agnostic detection and class-specific classification modules, enhanced by semantic knowledge injection, anomaly synthesis, and joint optimization, to achieve state-of-the-art performance.</description></item><item><title>Vad-R1: Towards Video Anomaly Reasoning via Perception-to-Cognition Chain-of-Thought</title><link>https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/vad-r1-towards-video-anomaly-reasoning-via/</link><pubDate>Sun, 01 Oct 2023 00:00:00 +0000</pubDate><guid>https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/vad-r1-towards-video-anomaly-reasoning-via/</guid><description>Proposes a structured Perception-to-Cognition Chain-of-Thought and introduces Vad-Reasoning dataset, along with an improved reinforcement learning algorithm AVA-GRPO, to enhance the deep reasoning capabilities of Multimodal Large Language Models in video anomaly detection and understanding tasks.</description></item><item><title>SlowFastVAD: Video Anomaly Detection via Integrating Simple Detector and RAG-Enhanced Vision-Language Model</title><link>https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/slowfastvad-video-anomaly-detection-via-integrating-simpledetector-and-rag-enhanced-vision-language-model/</link><pubDate>Mon, 01 May 2023 00:00:00 +0000</pubDate><guid>https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/slowfastvad-video-anomaly-detection-via-integrating-simpledetector-and-rag-enhanced-vision-language-model/</guid><description>Proposes a hybrid framework that integrates a fast anomaly detector with a slow, RAG-enhanced vision-language model to improve efficiency and interpretability in video anomaly detection. It employs a retrieval-augmented reasoning module for better scene-specific adaptation, uses an entropy-based intervention strategy to select ambiguous segments for slow detector analysis, and fuses outputs for robust detection.</description></item></channel></rss>