OSGNet @ Ego4D Episodic Memory Challenge 2025
This work addresses video localization challenges in egocentric AI for computer vision researchers, but it is incremental as it builds on existing methods with a specific fusion strategy.
The paper tackled the problem of precise interval localization in untrimmed egocentric videos for the Ego4D Episodic Memory Challenge, achieving first place in all three tracks (Natural Language Queries, Goal Step, and Moment Queries) by using an early fusion-based video localization model to improve accuracy over previous late fusion approaches.
In this report, we present our champion solutions for the three egocentric video localization tracks of the Ego4D Episodic Memory Challenge at CVPR 2025. All tracks require precise localization of the interval within an untrimmed egocentric video. Previous unified video localization approaches often rely on late fusion strategies, which tend to yield suboptimal results. To address this, we adopt an early fusion-based video localization model to tackle all three tasks, aiming to enhance localization accuracy. Ultimately, our method achieved first place in the Natural Language Queries, Goal Step, and Moment Queries tracks, demonstrating its effectiveness. Our code can be found at https://github.com/Yisen-Feng/OSGNet.