CV AI LGSep 16, 2025

Dual-Stage Reweighted MoE for Long-Tailed Egocentric Mistake Detection

Boyu Han, Qianqian Xu, Shilong Bao, Zhiyong Yang, Sicong Li, Qingming Huang

arXiv:2509.12990v28.42 citationsh-index: 28Has Code

Originality Incremental advance

AI Analysis

This addresses the challenge of long-tailed mistake detection in egocentric videos, which is incremental as it builds on existing methods like ViViT and MoE with specific adaptations.

The paper tackles the problem of detecting subtle and infrequent mistakes in egocentric video data by proposing a Dual-Stage Reweighted Mixture-of-Experts framework, achieving strong performance in identifying rare and ambiguous mistake instances.

In this report, we address the problem of determining whether a user performs an action incorrectly from egocentric video data. To handle the challenges posed by subtle and infrequent mistakes, we propose a Dual-Stage Reweighted Mixture-of-Experts (DR-MoE) framework. In the first stage, features are extracted using a frozen ViViT model and a LoRA-tuned ViViT model, which are combined through a feature-level expert module. In the second stage, three classifiers are trained with different objectives: reweighted cross-entropy to mitigate class imbalance, AUC loss to improve ranking under skewed distributions, and label-aware loss with sharpness-aware minimization to enhance calibration and generalization. Their predictions are fused using a classification-level expert module. The proposed method achieves strong performance, particularly in identifying rare and ambiguous mistake instances. The code is available at https://github.com/boyuh/DR-MoE.

View on arXiv PDF Code

Similar