CVFeb 25, 2025

Task Graph Maximum Likelihood Estimation for Procedural Activity Understanding in Egocentric Videos

arXiv:2502.17753v21 citationsh-index: 37Has Code
Originality Incremental advance
AI Analysis

This work addresses the challenge of procedural activity understanding for video analysis, with incremental improvements over existing methods.

The paper tackles the problem of learning task graphs from procedural activities in egocentric videos by introducing a gradient-based maximum likelihood estimation method, achieving F1-score improvements of +14.5%, +10.2%, and +13.6% on three datasets.

We introduce a gradient-based approach for learning task graphs from procedural activities, improving over hand-crafted methods. Our method directly optimizes edge weights via maximum likelihood, enabling integration into neural architectures. We validate our approach on CaptainCook4D, EgoPER, and EgoProceL, achieving +14.5%, +10.2%, and +13.6% F1-score improvements. Our feature-based approach for predicting task graphs from textual/video embeddings demonstrates emerging video understanding abilities. We also achieved top performance on the procedure understanding benchmark on Ego-Exo4D and significantly improved online mistake detection (+19.8% on Assembly101-O, +6.4% on EPIC-Tent-O). Code: https://github.com/fpv-iplab/Differentiable-Task-Graph-Learning.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes