ML LGJun 4, 2019

Towards Task and Architecture-Independent Generalization Gap Predictors

Scott Yak, Javier Gonzalvo, Hanna Mazzawi

arXiv:1906.01550v114.127 citations

Originality Incremental advance

AI Analysis

This work addresses the challenge of predicting generalization gaps in deep learning for researchers and practitioners, but it is incremental as it extends prior methods to new architectures.

The paper tackles the problem of predicting when deep learning works by developing task- and architecture-independent generalization gap predictors, achieving high performance with RNNs obtaining R^2=0.584 and DNNs reaching R^2=0.965.

Can we use deep learning to predict when deep learning works? Our results suggest the affirmative. We created a dataset by training 13,500 neural networks with different architectures, on different variations of spiral datasets, and using different optimization parameters. We used this dataset to train task-independent and architecture-independent generalization gap predictors for those neural networks. We extend Jiang et al. (2018) to also use DNNs and RNNs and show that they outperform the linear model, obtaining $R^2=0.965$. We also show results for architecture-independent, task-independent, and out-of-distribution generalization gap prediction tasks. Both DNNs and RNNs consistently and significantly outperform linear models, with RNNs obtaining $R^2=0.584$.

View on arXiv PDF

Similar