Universal Approximation with Quadratic Deep Networks
This work addresses the problem of improving neural network efficiency and capability for researchers and practitioners in machine learning, though it appears incremental as it builds on prior studies of quadratic neurons.
The paper investigates the expressive capabilities of quadratic deep networks compared to conventional neural networks, proving four theorems that show quadratic networks can approximate certain functions more efficiently, express functions that conventional networks cannot in the same structure, offer new insights into universal approximation, and require fewer weights in quantized versions for the same error bound.
Recently, deep learning has achieved huge successes in many important applications. In our previous studies, we proposed quadratic/second-order neurons and deep quadratic neural networks. In a quadratic neuron, the inner product of a vector of data and the corresponding weights in a conventional neuron is replaced with a quadratic function. The resultant quadratic neuron enjoys an enhanced expressive capability over the conventional neuron. However, how quadratic neurons improve the expressing capability of a deep quadratic network has not been studied up to now, preferably in relation to that of a conventional neural network. Regarding this, we ask four basic questions in this paper: (1) for the one-hidden-layer network structure, is there any function that a quadratic network can approximate much more efficiently than a conventional network? (2) for the same multi-layer network structure, is there any function that can be expressed by a quadratic network but cannot be expressed with conventional neurons in the same structure? (3) Does a quadratic network give a new insight into universal approximation? (4) To approximate the same class of functions with the same error bound, is a quantized quadratic network able to enjoy a lower number of weights than a quantized conventional network? Our main contributions are the four interconnected theorems shedding light upon these four questions and demonstrating the merits of a quadratic network in terms of expressive efficiency, unique capability, compact architecture and computational capacity respectively.