Semi Supervised on sis-arxiv-vad-papers

Semi Supervised on sis-arxiv-vad-papershttps://phuchoang2603.github.io/sis-arxiv-vad-papers/categories/semi-supervised/Recent content in Semi Supervised on sis-arxiv-vad-papersHugo -- gohugo.ioenMon, 28 Apr 2025 00:00:00 +0000Sherlock: Towards Multi-scene Video Abnormal Event Extraction and Localization via a Global-local Spatial-sensitive LLMhttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/sherlock-towards-multi-scene-video-abnormal-event-extraction-and-localization-via-a-global-local-spatial-sensitive-llm/Mon, 28 Apr 2025 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/sherlock-towards-multi-scene-video-abnormal-event-extraction-and-localization-via-a-global-local-spatial-sensitive-llm/Proposes a new task (M-VAE) for structured extraction and localization of abnormal events in videos, introduces Sherlock model with a Global-local Spatial-sensitive MoE module and a Spatial Imbalance Regulator, and demonstrates its effectiveness through extensive experiments.Personalizing Vision-Language Models With Hybrid Prompts for Zero-Shot Anomaly Detectionhttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/personalizing_vision-language_models_with_hybrid_prompts_for_zero-shot_anomaly_detection/Thu, 13 Feb 2025 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/personalizing_vision-language_models_with_hybrid_prompts_for_zero-shot_anomaly_detection/Introduces AnomalyVLM, a framework leveraging hybrid prompts derived from prior knowledge to enhance zero-shot anomaly detection by personalizing vision-language models, incorporating an anomaly region generator and refiner, and utilizing hybrid prompts for category-specific customization and improved detection performance.A Survey on Video Anomaly Detection via Deep Learning: Human, Vehicle, and Environmenthttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/survey-3/Sun, 01 Oct 2023 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/survey-3/This survey provides a comprehensive overview of deep learning-based Video Anomaly Detection (VAD), covering challenges, methodologies, domain-specific applications, and future research directions across human-centric, vehicle-centric, and environment-centric contexts. It introduces a taxonomy of supervision levels, adaptive learning strategies, and explores diverse application areas including healthcare, public safety, road surveillance, and disaster detection, emphasizing the latest advancements and open challenges.A VLM-based Method for Visual Anomaly Detection in Robotic Scientific Laboratorieshttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/a-vlm-based-method-for-visual-anomaly-detection-in-robotic-scientific-laboratories/Sun, 01 Oct 2023 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/a-vlm-based-method-for-visual-anomaly-detection-in-robotic-scientific-laboratories/Proposes a vision-language reasoning approach utilizing hierarchical prompts and Chain-of-Thought inference for process anomaly detection in scientific experiments. Constructs a benchmark based on real chemical laboratory workflows and demonstrates improved accuracy with prompt granularity, validated through real-world robotic lab testing.Action Hints: Semantic Typicality and Context Uniqueness for Generalizable Skeleton-based Video Anomaly Detectionhttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/typicality-and-context-uniqueness-for/Sun, 01 Oct 2023 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/typicality-and-context-uniqueness-for/Proposes a zero-shot skeleton-based video anomaly detection framework utilizing action semantic typicality and context uniqueness learning, involving a language-guided typicality modeling module and a test-time context uniqueness analysis module, achieving state-of-the-art results without target domain training data.An Attribute-based Method for Video Anomaly Detectionhttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/an-attribute-based-method-for-video-anomaly-detection/Sun, 01 Oct 2023 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/an-attribute-based-method-for-video-anomaly-detection/A simple attribute-based approach that represents each object by velocity and pose attributes, combining these with deep representations, and uses density estimation for anomaly scoring, achieving state-of-the-art performance.Language-guided Open-world Video Anomaly Detectionhttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/language-guided-open-world-vad/Sun, 01 Oct 2023 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/language-guided-open-world-vad/Proposes a novel open-world VAD paradigm guided by natural language, with a dynamic anomaly definition, regularization strategies, and a large-scale dataset (PreVAD) with multi-level annotations and descriptions. Achieves state-of-the-art zero-shot performance on seven datasets.Learning Suspected Anomalies from Event Prompts for Video Anomaly Detectionhttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/learning-suspected-anomalies-from-event-prompts/Sun, 01 Oct 2023 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/learning-suspected-anomalies-from-event-prompts/Proposes a novel framework named LAP that leverages textual event prompts and semantic similarity for weakly supervised video anomaly detection. It introduces a multi-prompt learning process, pseudo anomaly labeling, and integrates semantic features derived from a prompt dictionary to guide the detection model, resulting in improved performance across multiple datasets.Simplifying Traffic Anomaly Detection with Video Foundation Modelshttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/simplifying-traffic-anomaly-detection-with-video-foundation-models/Sun, 01 Oct 2023 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/simplifying-traffic-anomaly-detection-with-video-foundation-models/The paper investigates the use of simple encoder-only Video Vision Transformers (Video ViTs) with various pre-training strategies for traffic anomaly detection (TAD), demonstrating that with strong pretraining and domain adaptation, minimal architectural complexity can outperform complex prior methods, highlighting the importance of pretraining strategies like Masked Video Modeling (MVM).SUVAD: Semantic Understanding Based Video Anomaly Detection Using MLLMhttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/suvad_semantic_understanding_based_video_anomaly_detection_using_mllm/Sun, 01 Oct 2023 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/suvad_semantic_understanding_based_video_anomaly_detection_using_mllm/Proposes a training-free video anomaly detection method leveraging multi-modal large language models for semantic understanding of videos, enabling scene generalization, interpretability, and flexible anomaly definition without retraining.VADSK: VIDEO ANOMALY DETECTION WITH STRUCTURED KEYWORDShttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/vadsk-video-anomaly-detection-with-structured/Sun, 01 Oct 2023 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/vadsk-video-anomaly-detection-with-structured/A lightweight, interpretable, two-stage video anomaly detection pipeline employing foundational models for frame description generation and keyword-based classification, achieving comparable performance to state-of-the-art methods with real-time inference and enhanced interpretability.Video Anomaly Detection and Explanation via Large Language Modelshttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/video-anomaly-detection-and-explanation-via-large-language-models/Sun, 01 Oct 2023 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/video-anomaly-detection-and-explanation-via-large-language-models/The paper introduces VAD-LLaMA, a novel framework integrating video-based large language models (VLLMs) for threshold-free, explainable video anomaly detection, featuring a Long-Term Context (LTC) module and a three-phase training process that enhances long-range context modeling and minimizes data annotation costs.Advanced Video Anomaly Detection Using Deep Learninghttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/vane-bench-video-anomaly-evaluation/Sat, 15 Jul 2023 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/vane-bench-video-anomaly-evaluation/This paper introduces a novel deep learning framework for detecting anomalies in video content by leveraging semi-supervised approaches that require minimal labeled data, enhancing robustness and efficiency.SlowFastVAD: Video Anomaly Detection via Integrating Simple Detector and RAG-Enhanced Vision-Language Modelhttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/slowfastvad-video-anomaly-detection-via-integrating-simpledetector-and-rag-enhanced-vision-language-model/Mon, 01 May 2023 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/slowfastvad-video-anomaly-detection-via-integrating-simpledetector-and-rag-enhanced-vision-language-model/Proposes a hybrid framework that integrates a fast anomaly detector with a slow, RAG-enhanced vision-language model to improve efficiency and interpretability in video anomaly detection. It employs a retrieval-augmented reasoning module for better scene-specific adaptation, uses an entropy-based intervention strategy to select ambiguous segments for slow detector analysis, and fuses outputs for robust detection.Delving into CLIP latent space for Video Anomaly Recognitionhttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/delving-into-clip-latent-space-for-video-anomaly-recognition/Sun, 01 Jan 2023 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/delving-into-clip-latent-space-for-video-anomaly-recognition/Proposes AnomalyCLIP, a novel method leveraging Large Language and Vision (LLV) models like CLIP, combined with multiple instance learning and a re-centring transformation of the CLIP feature space, to detect and classify video anomalies and recognize anomaly types. Introduces a Selector model with prompt learning and a Temporal Transformer-based model for temporal dependency modeling; demonstrates state-of-the-art performance on multiple benchmarks.Hierarchical Semantic Contrast for Scene-aware Video Anomaly Detectionhttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/hierarchical-semantic-contrast-for-scene-aware-video-anomaly-detection/Sun, 01 Jan 2023 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/hierarchical-semantic-contrast-for-scene-aware-video-anomaly-detection/The paper proposes a hierarchical semantic contrast (HSC) method that leverages scene-aware autoencoders, semantic contrastive learning, and motion augmentation for improved scene-dependent and scene-independent video anomaly detection. It incorporates pre-trained video parsing models, hierarchical contrastive learning at scene and object levels, and skeleton-based motion augmentation to make the normal feature representations more compact and discriminative, thereby enhancing anomaly detection performance.TransAnomaly: Video Anomaly Detection Using Video Vision Transformerhttps://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/transanomaly_video_anomaly_detection_using_video_vision_transformer/Mon, 30 Aug 2021 00:00:00 +0000https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/transanomaly_video_anomaly_detection_using_video_vision_transformer/A prediction-based video anomaly detection approach combining U-Net and Video Vision Transformer (ViViT), with modifications for video prediction, capturing richer temporal and global context information, enabling anomaly localization.