Efficient Deep Learning of GMMs
This provides a theoretical explanation for the efficiency of deep neural networks in practical classification problems like speech, image, and text, though it is incremental as it builds on known universality of Gaussian distributions.
The paper tackles the problem of classifying Gaussian mixture models efficiently, showing that deep neural networks with two hidden layers require only O(n) neurons for optimal classification, while shallow networks need at least O(exp(n)) neurons or exponentially large coefficients.
We show that a collection of Gaussian mixture models (GMMs) in $R^{n}$ can be optimally classified using $O(n)$ neurons in a neural network with two hidden layers (deep neural network), whereas in contrast, a neural network with a single hidden layer (shallow neural network) would require at least $O(\exp(n))$ neurons or possibly exponentially large coefficients. Given the universality of the Gaussian distribution in the feature spaces of data, e.g., in speech, image and text, our result sheds light on the observed efficiency of deep neural networks in practical classification problems.