Ruidi Fan

2023

Aligning Effective Tokens with Video Anomaly in Large Language Models

1 October 2023·8317 words·40 mins

Yingxian Chen , Jiahui Liu , Ruidi Fan , Yanwei Li , Chirui Chang , Shizhen Zhao , Wilton W.T.Fok , Xiaojuan Qi , Yik-Chung Wu

Xd-Violence Hybrid Other

Proposes VA-GPT, a multimodal Large Language Model for video anomaly detection and understanding, utilizing effective token selection and generation modules (SETS and TETG) to improve spatial and temporal localization of anomalies. Introduces instruct-following fine-tuning data and cross-domain benchmarks for robustness evaluation.

↑