CuriosAI Submission to the EgoExo4D Proficiency Estimation Challenge 2025
This work addresses proficiency estimation in computer vision for skill assessment, but it is incremental as it builds on existing models like Sapiens-2B and VideoMAE for a specific challenge.
The paper tackled the problem of multi-view skill assessment in the EgoExo4D Proficiency Estimation Challenge by proposing two methods, with the two-stage pipeline achieving 47.8% accuracy, demonstrating the effectiveness of scenario-conditioned modeling.
This report presents the CuriosAI team's submission to the EgoExo4D Proficiency Estimation Challenge at CVPR 2025. We propose two methods for multi-view skill assessment: (1) a multi-task learning framework using Sapiens-2B that jointly predicts proficiency and scenario labels (43.6 % accuracy), and (2) a two-stage pipeline combining zero-shot scenario recognition with view-specific VideoMAE classifiers (47.8 % accuracy). The superior performance of the two-stage approach demonstrates the effectiveness of scenario-conditioned modeling for proficiency estimation.