MLLGJun 4, 2019

Towards Task and Architecture-Independent Generalization Gap Predictors

arXiv:1906.01550v127 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of predicting generalization gaps in deep learning for researchers and practitioners, but it is incremental as it extends prior methods to new architectures.

The paper tackles the problem of predicting when deep learning works by developing task- and architecture-independent generalization gap predictors, achieving high performance with RNNs obtaining R^2=0.584 and DNNs reaching R^2=0.965.

Can we use deep learning to predict when deep learning works? Our results suggest the affirmative. We created a dataset by training 13,500 neural networks with different architectures, on different variations of spiral datasets, and using different optimization parameters. We used this dataset to train task-independent and architecture-independent generalization gap predictors for those neural networks. We extend Jiang et al. (2018) to also use DNNs and RNNs and show that they outperform the linear model, obtaining $R^2=0.965$. We also show results for architecture-independent, task-independent, and out-of-distribution generalization gap prediction tasks. Both DNNs and RNNs consistently and significantly outperform linear models, with RNNs obtaining $R^2=0.584$.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes