CALLM: Cascading Autoencoder and Large Language Model for Video Anomaly Detection
·3578 words·17 mins
This paper introduces a novel cascade system combining a 3D Autoencoder with a Large Visual Language Model (LVLM) for video anomaly detection, leveraging weak supervision and multimodal capabilities to improve detection and explanation of abnormalities.
