CVJul 26, 2021

Adaptive Hierarchical Graph Reasoning with Semantic Coherence for Video-and-Language Inference

arXiv:2107.12270v228 citations
Originality Incremental advance
AI Analysis

This work addresses the problem of joint video-and-language understanding for AI systems, representing an incremental advancement with specific gains in this domain.

The paper tackled the Video-and-Language Inference task by proposing an adaptive hierarchical graph network with semantic coherence learning to address challenges like global correctness judgment and joint reasoning over video and subtitles, achieving significant performance improvements over baselines.

Video-and-Language Inference is a recently proposed task for joint video-and-language understanding. This new task requires a model to draw inference on whether a natural language statement entails or contradicts a given video clip. In this paper, we study how to address three critical challenges for this task: judging the global correctness of the statement involved multiple semantic meanings, joint reasoning over video and subtitles, and modeling long-range relationships and complex social interactions. First, we propose an adaptive hierarchical graph network that achieves in-depth understanding of the video over complex interactions. Specifically, it performs joint reasoning over video and subtitles in three hierarchies, where the graph structure is adaptively adjusted according to the semantic structures of the statement. Secondly, we introduce semantic coherence learning to explicitly encourage the semantic coherence of the adaptive hierarchical graph network from three hierarchies. The semantic coherence learning can further improve the alignment between vision and linguistics, and the coherence across a sequence of video segments. Experimental results show that our method significantly outperforms the baseline by a large margin.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes