VadCLIP: Adapting Vision-Language Models for Weakly Supervised Video Anomaly Detection
·6817 words·33 mins
A novel paradigm for weakly supervised video anomaly detection leveraging frozen CLIP model with dual-branch architecture, temporal modeling modules, and prompt mechanisms to utilize vision-language knowledge for both coarse- and fine-grained detection tasks, achieving state-of-the-art performance on benchmarks.
