CVAIAug 29, 2024

Beyond Uncertainty: Evidential Deep Learning for Robust Video Temporal Grounding

arXiv:2408.16272v111 citationsh-index: 21
Originality Incremental advance
AI Analysis

This work addresses robustness issues in VTG for applications involving untrimmed videos and open-vocabulary queries, representing an incremental advance by adapting uncertainty estimation to this domain.

The paper tackles the problem of unreliable predictions in Video Temporal Grounding (VTG) for open-world challenges like noisy and out-of-distribution data by introducing SRAM, a robust network module that integrates Deep Evidential Regression (DER) with a novel Geom-regularizer, achieving improved robustness and interpretability in VTG tasks.

Existing Video Temporal Grounding (VTG) models excel in accuracy but often overlook open-world challenges posed by open-vocabulary queries and untrimmed videos. This leads to unreliable predictions for noisy, corrupted, and out-of-distribution data. Adapting VTG models to dynamically estimate uncertainties based on user input can address this issue. To this end, we introduce SRAM, a robust network module that benefits from a two-stage cross-modal alignment task. More importantly, it integrates Deep Evidential Regression (DER) to explicitly and thoroughly quantify uncertainty during training, thus allowing the model to say "I do not know" in scenarios beyond its handling capacity. However, the direct application of traditional DER theory and its regularizer reveals structural flaws, leading to unintended constraints in VTG tasks. In response, we develop a simple yet effective Geom-regularizer that enhances the uncertainty learning framework from the ground up. To the best of our knowledge, this marks the first successful attempt of DER in VTG. Our extensive quantitative and qualitative results affirm the effectiveness, robustness, and interpretability of our modules and the uncertainty learning paradigm in VTG tasks. The code will be made available.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes