Experimental Design for Overparameterized Learning with Application to Single Shot Deep Active Learning
This work addresses the data labeling bottleneck for deep learning practitioners by adapting experimental design to modern overparameterized models, though it is incremental as it builds on classical theory.
The paper tackles the problem of selecting data points for labeling in overparameterized models like deep neural networks, where classical experimental design methods fail because they focus on variance reduction for underparameterized models. The authors propose a new design strategy for overparameterized regression and interpolation, and demonstrate its application with a single-shot deep active learning algorithm, showing improved performance in experiments.
The impressive performance exhibited by modern machine learning models hinges on the ability to train such models on a very large amounts of labeled data. However, since access to large volumes of labeled data is often limited or expensive, it is desirable to alleviate this bottleneck by carefully curating the training set. Optimal experimental design is a well-established paradigm for selecting data point to be labeled so to maximally inform the learning process. Unfortunately, classical theory on optimal experimental design focuses on selecting examples in order to learn underparameterized (and thus, non-interpolative) models, while modern machine learning models such as deep neural networks are overparameterized, and oftentimes are trained to be interpolative. As such, classical experimental design methods are not applicable in many modern learning setups. Indeed, the predictive performance of underparameterized models tends to be variance dominated, so classical experimental design focuses on variance reduction, while the predictive performance of overparameterized models can also be, as is shown in this paper, bias dominated or of mixed nature. In this paper we propose a design strategy that is well suited for overparameterized regression and interpolation, and we demonstrate the applicability of our method in the context of deep learning by proposing a new algorithm for single shot deep active learning.