Harnessing Large Language Models for Training-free Video Anomaly Detection
·6913 words·33 mins
Introduces a training-free method for video anomaly detection (VAD) leveraging pre-trained large language models (LLMs) and vision-language models (VLMs). Proposes techniques for caption cleaning, scene description, and anomaly scoring without additional training, demonstrating superior performance on surveillance datasets.
