Task Graph Maximum Likelihood Estimation for Procedural Activity Understanding in Egocentric Videos
This work addresses the challenge of procedural activity understanding for video analysis, with incremental improvements over existing methods.
The paper tackles the problem of learning task graphs from procedural activities in egocentric videos by introducing a gradient-based maximum likelihood estimation method, achieving F1-score improvements of +14.5%, +10.2%, and +13.6% on three datasets.
We introduce a gradient-based approach for learning task graphs from procedural activities, improving over hand-crafted methods. Our method directly optimizes edge weights via maximum likelihood, enabling integration into neural architectures. We validate our approach on CaptainCook4D, EgoPER, and EgoProceL, achieving +14.5%, +10.2%, and +13.6% F1-score improvements. Our feature-based approach for predicting task graphs from textual/video embeddings demonstrates emerging video understanding abilities. We also achieved top performance on the procedure understanding benchmark on Ego-Exo4D and significantly improved online mistake detection (+19.8% on Assembly101-O, +6.4% on EPIC-Tent-O). Code: https://github.com/fpv-iplab/Differentiable-Task-Graph-Learning.