LGCVJan 16, 2013

Big Neural Networks Waste Capacity

arXiv:1301.3583v487 citations
AI Analysis

This addresses a fundamental optimization bottleneck in deep learning for large-scale applications like ImageNet, though it is incremental as it builds on known issues of diminishing returns.

The paper identifies that large neural networks fail to leverage added capacity to reduce underfitting, with experiments on ImageNet LSVRC-2010 showing highly diminishing returns in training error as capacity increases. This suggests that first-order gradient descent optimization fails in this regime, potentially hindering generalization on large datasets requiring high capacity.

This article exposes the failure of some big neural networks to leverage added capacity to reduce underfitting. Past research suggest diminishing returns when increasing the size of neural networks. Our experiments on ImageNet LSVRC-2010 show that this may be due to the fact there are highly diminishing returns for capacity in terms of training error, leading to underfitting. This suggests that the optimization method - first order gradient descent - fails at this regime. Directly attacking this problem, either through the optimization method or the choices of parametrization, may allow to improve the generalization error on large datasets, for which a large capacity is required.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes