Does progress on ImageNet transfer to real-world datasets?
This work highlights a gap in evaluating machine learning models for practical applications, suggesting that current benchmarks may not adequately reflect real-world performance.
The study investigated whether improvements in ImageNet accuracy transfer to real-world image classification datasets, finding that higher ImageNet accuracy does not consistently lead to better performance on tasks like camera trap or satellite image classification, with data augmentation sometimes being more effective than architectural changes.
Does progress on ImageNet transfer to real-world datasets? We investigate this question by evaluating ImageNet pre-trained models with varying accuracy (57% - 83%) on six practical image classification datasets. In particular, we study datasets collected with the goal of solving real-world tasks (e.g., classifying images from camera traps or satellites), as opposed to web-scraped benchmarks collected for comparing models. On multiple datasets, models with higher ImageNet accuracy do not consistently yield performance improvements. For certain tasks, interventions such as data augmentation improve performance even when architectures do not. We hope that future benchmarks will include more diverse datasets to encourage a more comprehensive approach to improving learning algorithms.