Separable Layers Enable Structured Efficient Linear Substitutions
This work provides an incremental improvement for deep learning practitioners by enhancing network efficiency without sacrificing task performance.
The paper tackles the problem of improving efficiency/accuracy tradeoffs in neural networks by replacing linear components in pointwise convolutions with structured linear decompositions, resulting in Pareto-optimal benefits in computation and parameter count.
In response to the development of recent efficient dense layers, this paper shows that something as simple as replacing linear components in pointwise convolutions with structured linear decompositions also produces substantial gains in the efficiency/accuracy tradeoff. Pointwise convolutions are fully connected layers and are thus prepared for replacement by structured transforms. Networks using such layers are able to learn the same tasks as those using standard convolutions, and provide Pareto-optimal benefits in efficiency/accuracy, both in terms of computation (mult-adds) and parameter count (and hence memory). Code is available at https://github.com/BayesWatch/deficient-efficient.