VLAVAD: Vision-Language Models Assisted Unsupervised Video Anomaly Detection
·6374 words·30 mins
Proposes VLAVAD, an unsupervised video anomaly detection method leveraging vision-language pre-trained models, utilizing semantic features, Selective Prompt Adapter, and Sequence State Space Module to improve interpretability and transferability, achieving state-of-the-art performance on the ShanghaiTech dataset.
