<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Xiaochun Cao on sis-arxiv-vad-papers</title><link>https://phuchoang2603.github.io/sis-arxiv-vad-papers/authors/xiaochun-cao/</link><description>Recent content in Xiaochun Cao on sis-arxiv-vad-papers</description><generator>Hugo -- gohugo.io</generator><language>en</language><lastBuildDate>Wed, 01 Jan 2025 00:00:00 +0000</lastBuildDate><atom:link href="https://phuchoang2603.github.io/sis-arxiv-vad-papers/authors/xiaochun-cao/index.xml" rel="self" type="application/rss+xml"/><item><title>Ex-VAD: Explainable Fine-grained Video Anomaly Detection Based on Visual-Language Models</title><link>https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/3552_ex_vad_explainable_fine_g/</link><pubDate>Wed, 01 Jan 2025 00:00:00 +0000</pubDate><guid>https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/3552_ex_vad_explainable_fine_g/</guid><description>The paper introduces Ex-VAD, a comprehensive framework for fine-grained and explainable video anomaly detection that leverages visual-language models (VLMs) and large language models (LLMs). It features modules for generating anomaly explanations, fusing multimodal features for coarse detection, and expanding/aligning labels for fine-grained classification, with improved interpretability and accuracy demonstrated on UCF-Crime and XD-Violence datasets.</description></item><item><title>Vad-R1: Towards Video Anomaly Reasoning via Perception-to-Cognition Chain-of-Thought</title><link>https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/vad-r1-towards-video-anomaly-reasoning-via/</link><pubDate>Sun, 01 Oct 2023 00:00:00 +0000</pubDate><guid>https://phuchoang2603.github.io/sis-arxiv-vad-papers/papers/vad-r1-towards-video-anomaly-reasoning-via/</guid><description>Proposes a structured Perception-to-Cognition Chain-of-Thought and introduces Vad-Reasoning dataset, along with an improved reinforcement learning algorithm AVA-GRPO, to enhance the deep reasoning capabilities of Multimodal Large Language Models in video anomaly detection and understanding tasks.</description></item></channel></rss>