CVSep 17, 2024

AMEGO: Active Memory from long EGOcentric videos

arXiv:2409.10917v129 citationsh-index: 44
Originality Incremental advance
AI Analysis

This addresses the challenge of unstructured egocentric video perception for applications like video question answering, though it appears incremental as it builds on existing video understanding methods.

The authors tackled the problem of understanding very-long egocentric videos by introducing AMEGO, which constructs self-contained representations to capture key locations and object interactions, resulting in improved performance on the new Active Memories Benchmark, surpassing other video QA baselines by a substantial margin.

Egocentric videos provide a unique perspective into individuals' daily experiences, yet their unstructured nature presents challenges for perception. In this paper, we introduce AMEGO, a novel approach aimed at enhancing the comprehension of very-long egocentric videos. Inspired by the human's ability to maintain information from a single watching, AMEGO focuses on constructing a self-contained representations from one egocentric video, capturing key locations and object interactions. This representation is semantic-free and facilitates multiple queries without the need to reprocess the entire visual content. Additionally, to evaluate our understanding of very-long egocentric videos, we introduce the new Active Memories Benchmark (AMB), composed of more than 20K of highly challenging visual queries from EPIC-KITCHENS. These queries cover different levels of video reasoning (sequencing, concurrency and temporal grounding) to assess detailed video understanding capabilities. We showcase improved performance of AMEGO on AMB, surpassing other video QA baselines by a substantial margin.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes