CV AI LGOct 11, 2024

Vision Backbone Efficient Selection for Image Classification in Low-Data Regimes

Joris Guerin, Shray Bansal, Amirreza Shaban, Paulo Mann, Harshvardhan Gazula

arXiv:2410.08592v22.0h-index: 13

Originality Incremental advance

AI Analysis

This addresses the challenge for practitioners in computer vision who need efficient and effective backbone selection when working with limited annotated data, though it is incremental as it formalizes and tests existing ideas.

The paper tackles the problem of selecting the best vision backbone for image classification in low-data regimes, showing that dataset-specific selection outperforms generic benchmarks by efficiently searching over 1300 pretrained models within ten minutes on a single GPU.

Transfer learning has become an essential tool in modern computer vision, allowing practitioners to leverage backbones, pretrained on large datasets, to train successful models from limited annotated data. Choosing the right backbone is crucial, especially for small datasets, since final performance depends heavily on the quality of the initial feature representations. While prior work has conducted benchmarks across various datasets to identify universal top-performing backbones, we demonstrate that backbone effectiveness is highly dataset-dependent, especially in low-data scenarios where no single backbone consistently excels. To overcome this limitation, we introduce dataset-specific backbone selection as a new research direction and investigate its practical viability in low-data regimes. Since exhaustive evaluation is computationally impractical for large backbone pools, we formalize Vision Backbone Efficient Selection (VIBES) as the problem of searching for high-performing backbones under computational constraints. We define the solution space, propose several heuristics, and demonstrate VIBES feasibility for low-data image classification by performing experiments on four diverse datasets. Our results show that even simple search strategies can find well-suited backbones within a pool of over $1300$ pretrained models, outperforming generic benchmark recommendations within just ten minutes of search time on a single GPU (NVIDIA RTX A5000).

View on arXiv PDF

Similar