MoE3D: A Mixture-of-Experts Module for 3D Reconstruction
This work addresses depth discontinuity issues in 3D reconstruction for computer vision applications, offering an incremental improvement with efficient computation.
The paper tackles the problem of blurry depth boundaries in feed-forward 3D reconstruction by introducing a mixture-of-experts module that combines multiple smooth depth predictions with per-pixel weighting, resulting in reduced artifacts and improved accuracy with negligible added inference cost.
We propose a simple yet effective approach to enhance the performance of feed-forward 3D reconstruction models. Existing methods often struggle near depth discontinuities, where standard regression losses encourage spatial averaging and thus blur sharp boundaries. To address this issue, we introduce a mixture-of-experts formulation that handles uncertainty at depth boundaries by combining multiple smooth depth predictions. A softmax weighting head dynamically selects among these hypotheses on a per-pixel basis. By integrating our mixture model into a pre-trained state-of-the-art 3D model, we achieve a substantial reduction of boundary artifacts and gains in overall reconstruction accuracy. Notably, our approach is highly compute efficient, delivering generalizable improvements even when fine-tuned on a small subset of training data while incurring only negligible additional inference computation, suggesting a promising direction for lightweight and accurate 3D reconstruction.