CVFeb 27

EgoGraph: Temporal Knowledge Graph for Egocentric Video Understanding

Shitong Sun, Ke Han, Yukai Huang, Weitong Cai, Jifei Song
arXiv:2602.23709v1
Originality Incremental advance
AI Analysis

This addresses the challenge of reasoning over extended video sequences for applications in egocentric video analysis, representing a new paradigm rather than an incremental improvement.

The paper tackles the problem of understanding ultra-long egocentric videos spanning multiple days by introducing EgoGraph, a training-free framework that constructs a temporal knowledge graph to encode long-term dependencies, achieving state-of-the-art performance on benchmarks like EgoLifeQA and EgoR1-bench.

Ultra-long egocentric videos spanning multiple days present significant challenges for video understanding. Existing approaches still rely on fragmented local processing and limited temporal modeling, restricting their ability to reason over such extended sequences. To address these limitations, we introduce EgoGraph, a training-free and dynamic knowledge-graph construction framework that explicitly encodes long-term, cross-entity dependencies in egocentric video streams. EgoGraph employs a novel egocentric schema that unifies the extraction and abstraction of core entities, such as people, objects, locations, and events, and structurally reasons about their attributes and interactions, yielding a significantly richer and more coherent semantic representation than traditional clip-based video models. Crucially, we develop a temporal relational modeling strategy that captures temporal dependencies across entities and accumulates stable long-term memory over multiple days, enabling complex temporal reasoning. Extensive experiments on the EgoLifeQA and EgoR1-bench benchmarks demonstrate that EgoGraph achieves state-of-the-art performance on long-term video question answering, validating its effectiveness as a new paradigm for ultra-long egocentric video understanding.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes