Skip to main content

Ruidi Fan

2023

Aligning Effective Tokens with Video Anomaly in Large Language Models

Proposes VA-GPT, a multimodal Large Language Model for video anomaly detection and understanding, utilizing effective token selection and generation modules (SETS and TETG) to improve spatial and temporal localization of anomalies. Introduces instruct-following fine-tuning data and cross-domain benchmarks for robustness evaluation.