Dynamic Capacity Networks
This addresses efficiency issues for deep learning practitioners by enabling more computationally efficient models, though it is incremental as it builds on existing neural network architectures.
The paper tackles the problem of reducing computational costs in neural networks by introducing Dynamic Capacity Networks (DCNs), which adaptively assign capacity across input data using low- and high-capacity sub-networks guided by a gradient-based attention mechanism; results show DCNs drastically reduce computations while maintaining or improving performance on Cluttered MNIST and SVHN datasets.
We introduce the Dynamic Capacity Network (DCN), a neural network that can adaptively assign its capacity across different portions of the input data. This is achieved by combining modules of two types: low-capacity sub-networks and high-capacity sub-networks. The low-capacity sub-networks are applied across most of the input, but also provide a guide to select a few portions of the input on which to apply the high-capacity sub-networks. The selection is made using a novel gradient-based attention mechanism, that efficiently identifies input regions for which the DCN's output is most sensitive and to which we should devote more capacity. We focus our empirical evaluation on the Cluttered MNIST and SVHN image datasets. Our findings indicate that DCNs are able to drastically reduce the number of computations, compared to traditional convolutional neural networks, while maintaining similar or even better performance.