CVMMMar 1, 2025

Unbiased Video Scene Graph Generation via Visual and Semantic Dual Debiasing

arXiv:2503.00548v25 citationsh-index: 3CVPR
Originality Incremental advance
AI Analysis

This work addresses bias issues in video scene graph generation, which is important for applications like video understanding, but it appears incremental as it builds on existing unbiased VidSGG approaches.

The paper tackles biases in Video Scene Graph Generation (VidSGG) by proposing the VISA framework, which uses visual and semantic dual debiasing to improve object representations and reduce semantic bias, resulting in a +13.1% improvement in mR@20 and mR@50 for the SGCLS task under Semi Constraint.

Video Scene Graph Generation (VidSGG) aims to capture dynamic relationships among entities by sequentially analyzing video frames and integrating visual and semantic information. However, VidSGG is challenged by significant biases that skew predictions. To mitigate these biases, we propose a VIsual and Semantic Awareness (VISA) framework for unbiased VidSGG. VISA addresses visual bias through memory-enhanced temporal integration that enhances object representations and concurrently reduces semantic bias by iteratively integrating object features with comprehensive semantic information derived from triplet relationships. This visual-semantics dual debiasing approach results in more unbiased representations of complex scene dynamics. Extensive experiments demonstrate the effectiveness of our method, where VISA outperforms existing unbiased VidSGG approaches by a substantial margin (e.g., +13.1% improvement in mR@20 and mR@50 for the SGCLS task under Semi Constraint).

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes