Muchao Ye

Introduces VERA, a framework that enables frozen vision-language models to perform explainable video anomaly detection by learning detailed anomaly-characterization questions from coarsely labeled data, without model parameter modifications. The method decomposes complex reasoning into reflections on guiding questions, optimizes them via verbal interactions, and guides VLMs to generate segment- and frame-level anomaly scores with improved explainability and performance.

Muchao Ye

2023

VERA: Explainable Video Anomaly Detection via Verbalized Learning of Vision-Language Models