CV LGMar 2, 2023

A Meta-Learning Approach to Predicting Performance and Data Requirements

Achin Jain, Gurumurthy Swaminathan, Paolo Favaro, Hao Yang, Avinash Ravichandran, Hrayr Harutyunyan, Alessandro Achille, Onkar Dabeer, Bernt Schiele, Ashwin Swaminathan, Stefano Soatto

arXiv:2303.01598v16.89 citationsh-index: 137

Originality Incremental advance

AI Analysis

This addresses the challenge of predicting data requirements for machine learning practitioners, offering a more accurate method for resource planning, though it is incremental over existing power law approaches.

The paper tackles the problem of estimating the number of samples needed for a model to achieve target performance, showing that the standard power law fails in few-shot regimes. It introduces a piecewise power law (PPL) with meta-learning, improving performance estimation by 37% on classification and 33% on detection datasets, and reducing data over-estimation by 76% and 91% respectively.

We propose an approach to estimate the number of samples required for a model to reach a target performance. We find that the power law, the de facto principle to estimate model performance, leads to large error when using a small dataset (e.g., 5 samples per class) for extrapolation. This is because the log-performance error against the log-dataset size follows a nonlinear progression in the few-shot regime followed by a linear progression in the high-shot regime. We introduce a novel piecewise power law (PPL) that handles the two data regimes differently. To estimate the parameters of the PPL, we introduce a random forest regressor trained via meta learning that generalizes across classification/detection tasks, ResNet/ViT based architectures, and random/pre-trained initializations. The PPL improves the performance estimation on average by 37% across 16 classification and 33% across 10 detection datasets, compared to the power law. We further extend the PPL to provide a confidence bound and use it to limit the prediction horizon that reduces over-estimation of data by 76% on classification and 91% on detection datasets.

View on arXiv PDF

Similar