CVAILGSep 16, 2025

Dual-Stage Reweighted MoE for Long-Tailed Egocentric Mistake Detection

arXiv:2509.12990v22 citationsh-index: 28Has Code
Originality Incremental advance
AI Analysis

This addresses the challenge of long-tailed mistake detection in egocentric videos, which is incremental as it builds on existing methods like ViViT and MoE with specific adaptations.

The paper tackles the problem of detecting subtle and infrequent mistakes in egocentric video data by proposing a Dual-Stage Reweighted Mixture-of-Experts framework, achieving strong performance in identifying rare and ambiguous mistake instances.

In this report, we address the problem of determining whether a user performs an action incorrectly from egocentric video data. To handle the challenges posed by subtle and infrequent mistakes, we propose a Dual-Stage Reweighted Mixture-of-Experts (DR-MoE) framework. In the first stage, features are extracted using a frozen ViViT model and a LoRA-tuned ViViT model, which are combined through a feature-level expert module. In the second stage, three classifiers are trained with different objectives: reweighted cross-entropy to mitigate class imbalance, AUC loss to improve ranking under skewed distributions, and label-aware loss with sharpness-aware minimization to enhance calibration and generalization. Their predictions are fused using a classification-level expert module. The proposed method achieves strong performance, particularly in identifying rare and ambiguous mistake instances. The code is available at https://github.com/boyuh/DR-MoE.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes