Learning to Understand Open-World Video Anomalies
·11409 words·54 mins
Jiaqi Tang
,
Hao Lu
,
Ruizheng Wu
,
Xiaogang Xu
,
Ke Ma
,
Cheng Fang
,
Bin Guo
,
Jiangbo Lu
,
Qifeng Chen
,
Ying-Cong Chen
Introduces HAWK, a novel framework leveraging interactive large Visual Language Models with explicit and implicit motion modality integration, auxiliary consistency loss, and detailed language annotations for diverse video anomaly scenarios. Demonstrates state-of-the-art performance in video description and question-answering tasks across multiple open-world datasets, with extensive annotated data and generation pipelines to enhance practical anomaly understanding and interaction capabilities.
