Skip to main content

Qinghua Hu

2025

Multimodal VAD: Visual Anomaly Detection in Intelligent Monitoring System via Audio-Vision-Language

The paper proposes a dual-stream multimodal video anomaly detection network that leverages video, audio, and text modalities to achieve reliable and precise anomaly detection. It introduces effective multimodal fusion, abnormal-aware context prompts (ACPs), and a coarse-support-fine strategy to enhance anomaly discrimination and description, demonstrating superior performance on large-scale datasets.