Capacity allocation analysis of neural networks: A tool for principled architecture design
This provides a tool for principled architecture design, addressing a foundational issue in machine learning, though it is incremental as it builds on existing linear analysis methods.
The paper tackles the problem of understanding what neural network architectures focus their modeling capacity on for given tasks by introducing capacity allocation analysis, which quantifies the effective parameters allocated to dependencies in the input space, and applies it to compare classical architectures on synthetic tasks.
Designing neural network architectures is a task that lies somewhere between science and art. For a given task, some architectures are eventually preferred over others, based on a mix of intuition, experience, experimentation and luck. For many tasks, the final word is attributed to the loss function, while for some others a further perceptual evaluation is necessary to assess and compare performance across models. In this paper, we introduce the concept of capacity allocation analysis, with the aim of shedding some light on what network architectures focus their modelling capacity on, when used on a given task. We focus more particularly on spatial capacity allocation, which analyzes a posteriori the effective number of parameters that a given model has allocated for modelling dependencies on a given point or region in the input space, in linear settings. We use this framework to perform a quantitative comparison between some classical architectures on various synthetic tasks. Finally, we consider how capacity allocation might translate in non-linear settings.