Holmes-VAD: Towards Unbiased and Explainable Video Anomaly Detection via Multi-modal LLM
·8242 words·39 mins
A novel framework leveraging multimodal instructions and large-scale datasets to enable unbiased, interpretable, and accurate video anomaly detection with large language models, including a new dataset VAD-Instruct50k with single-frame annotations and explanatory instruction data.
