CVMar 10

YOLO-NAS-Bench: A Surrogate Benchmark with Self-Evolving Predictors for YOLO Architecture Search

Zhe Li, Xiaoyu Ding, Jiaxin Zheng, Yongtao Wang

arXiv:2603.09405v15.6h-index: 3

Predicted impact top 85% in CV · last 90 daysOriginality Incremental advance

AI Analysis

This addresses the lack of NAS benchmarks for object detection, providing a tool for the detection community, though it is incremental as it builds on existing NAS and YOLO concepts.

The paper tackles the high evaluation cost bottleneck in Neural Architecture Search (NAS) for object detection by introducing YOLO-NAS-Bench, a surrogate benchmark with a self-evolving predictor that improves R² from 0.770 to 0.815 and Sparse Kendall Tau from 0.694 to 0.752, enabling discovery of architectures that surpass official YOLO baselines.

Neural Architecture Search (NAS) for object detection is severely bottlenecked by high evaluation cost, as fully training each candidate YOLO architecture on COCO demands days of GPU time. Meanwhile, existing NAS benchmarks largely target image classification, leaving the detection community without a comparable benchmark for NAS evaluation. To address this gap, we introduce YOLO-NAS-Bench, the first surrogate benchmark tailored to YOLO-style detectors. YOLO-NAS-Bench defines a search space spanning channel width, block depth, and operator type across both backbone and neck, covering the core modules of YOLOv8 through YOLO12. We sample 1,000 architectures via random, stratified, and Latin Hypercube strategies, train them on COCO-mini, and build a LightGBM surrogate predictor. To sharpen the predictor in the high-performance regime most relevant to NAS, we propose a Self-Evolving Mechanism that progressively aligns the predictor's training distribution with the high-performance frontier, by using the predictor itself to discover and evaluate informative architectures in each iteration. This method grows the pool to 1,500 architectures and raises the ensemble predictor's R2 from 0.770 to 0.815 and Sparse Kendall Tau from 0.694 to 0.752, demonstrating strong predictive accuracy and ranking consistency. Using the final predictor as the fitness function for evolutionary search, we discover architectures that surpass all official YOLOv8-YOLO12 baselines at comparable latency on COCO-mini, confirming the predictor's discriminative power for top-performing detection architectures.

View on arXiv PDF

Similar