CVAILGMay 30

MoEIoU: Rethinking Bounding-Box Regression as a Mixture of Experts

arXiv:2606.008448.1
Predicted impact top 97% in CV · last 90 daysOriginality Incremental advance
AI Analysis

For object detection practitioners, MoEIoU provides a simple yet effective improvement over existing IoU-based losses, though the gains are incremental.

The paper introduces MoEIoU, a mixture-of-experts based loss function for bounding-box regression that adaptively weights overlap, center alignment, and aspect-ratio penalties during training. It consistently outperforms standard and recent state-of-the-art losses on PASCAL VOC, HRIPCB, and MS COCO, achieving faster convergence and improved localization accuracy.

Bounding-box regression is a fundamental component of object detection, playing a critical role in precise object localization. Existing Intersection-over-Union (IoU)-based loss functions extend the IoU objective by incorporating geometric penalties, such as center-distance and aspect-ratio mismatch, to improve bounding-box regression. However, these penalties typically remain fixed throughout training and do not account for the optimization dynamics in which predicted boxes initially exhibit large center-distance and shape errors, with later stages focusing on improving overlap with the ground truth. To address this limitation, we introduce MoEIoU, a mixture-of-experts based regression loss that jointly models overlap, center alignment, and aspect-ratio mismatch. MoEIoU aggregates these components using a log-sum-exp function, which emphasizes the dominant localization error while maintaining smooth contributions from other terms. Additionally, a curriculum-based weighting schedule is employed to prioritize correcting box position and shape in early training stages and improving overlap in later stages. We evaluated proposed MoEIoU on PASCAL VOC, HRIPCB, and MS COCO using multiple YOLO architectures, along with large-scale simulation experiments. It consistently outperforms standard and recent state-of-the-art losses, demonstrating faster convergence and improved localization accuracy. We further show that this adaptive aggregation improves existing IoU-based losses, yielding consistent gains and providing more effective optimization guidance for bounding-box regression in object detection frameworks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes