CVAICLLGDec 4, 2025

SEASON: Mitigating Temporal Hallucination in Video Large Language Models via Self-Diagnostic Contrastive Decoding

arXiv:2512.04643v14 citationsh-index: 3
Originality Highly original
AI Analysis

This addresses a critical problem for video AI applications by mitigating temporal hallucinations in VideoLLMs, representing a novel method for a known bottleneck rather than an incremental improvement.

The paper tackles temporal hallucination in Video Large Language Models, where models generate temporally inconsistent event descriptions, and proposes SEASON, a training-free method that improves temporal faithfulness by diagnosing hallucination tendencies and applying adaptive contrastive decoding, achieving state-of-the-art results on three hallucination benchmarks and enhancing performance on four general video understanding benchmarks.

Video Large Language Models (VideoLLMs) have shown remarkable progress in video understanding. However, these models still struggle to effectively perceive and exploit rich temporal information in videos when responding to user queries. Therefore, they often generate descriptions of events that are temporal inconsistent or causally implausible, causing severe hallucination issues. While most prior studies have focused on spatial hallucinations (e.g. object mismatches), temporal reasoning in video understanding remains relatively underexplored. To address this issue, we propose Self-Diagnostic Contrastive Decoding (SEASON), a training-free method that adaptively enhances temporal and spatial faithfulness for each output token. It achieves this by dynamically diagnosing each token's hallucination tendency and applying adaptive contrastive decoding against its corresponding temporal and spatial negatives. Extensive experiments demonstrate that SEASON outperforms all existing training-free hallucination mitigation approaches on three hallucination examination benchmarks, while further improves VideoLLMs across four general video understanding benchmarks. The code will be released upon acceptance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes