Delving into CLIP latent space for Video Anomaly Recognition
·11434 words·54 mins
Proposes AnomalyCLIP, a novel method leveraging Large Language and Vision (LLV) models like CLIP, combined with multiple instance learning and a re-centring transformation of the CLIP feature space, to detect and classify video anomalies and recognize anomaly types. Introduces a Selector model with prompt learning and a Temporal Transformer-based model for temporal dependency modeling; demonstrates state-of-the-art performance on multiple benchmarks.
