Xiaonan Huang

2023

Towards Generic Anomaly Detection and Understanding: Large-scale Visual-linguistic Model (GPT-4V) Takes the Lead

31 October 2023·12272 words·58 mins

Yunkang Cao , Xiaohao Xu , Chen Sun , Xiaonan Huang , Weiming Shen

This study explores the use of GPT-4V, a large visual-linguistic model, for generic anomaly detection across multiple modalities and domains, demonstrating its ability to understand global and fine-grained semantics, reason automatically, and improve with prompts. It evaluates GPT-4V on diverse tasks including industrial, medical, logical, video, 3D, and time series anomaly detection, discussing its promising performance and future directions for enhancement, such as quantitative metrics, expanded benchmarks, multi-round interactions, human feedback, and real-time application.

Holmes-VAU: Towards Long-term Video Anomaly Understanding at Any Granularity

1 October 2023·11025 words·52 mins

Huaxin Zhang , Xiaohao Xu , Xiang Wang , Jialong Zuo , Xiaonan Huang , Changxin Gao , Li Yu , Shanjun Zhang , Nong Sang

Ucf-Crime Other Hybrid Other

A semi-automated hierarchical video annotation framework combined with a novel Anomaly-focused Temporal Sampler and a multimodal large language model, aimed at comprehensive understanding of complex and long-term video anomalies across multiple temporal scales.

Holmes-VAD: Towards Unbiased and Explainable Video Anomaly Detection via Multi-modal LLM

1 October 2023·8242 words·39 mins

Huaxin Zhang , Xiaohao Xu , Xiang Wang , Jialong Zuo , Chuchu Han , Xiaonan Huang , Changxin Gao , Yuehuan Wang , Nong Sang

Shanghaitech Ucf-Crime Xd-Violence Hybrid Method

A novel framework leveraging multimodal instructions and large-scale datasets to enable unbiased, interpretable, and accurate video anomaly detection with large language models, including a new dataset VAD-Instruct50k with single-frame annotations and explanatory instruction data.