Holmes-VAU: Towards Long-term Video Anomaly Understanding at Any Granularity
·11025 words·52 mins
Huaxin Zhang
,
Xiaohao Xu
,
Xiang Wang
,
Jialong Zuo
,
Xiaonan Huang
,
Changxin Gao
,
Li Yu
,
Shanjun Zhang
,
Nong Sang
A semi-automated hierarchical video annotation framework combined with a novel Anomaly-focused Temporal Sampler and a multimodal large language model, aimed at comprehensive understanding of complex and long-term video anomalies across multiple temporal scales.
