Deep Semi-Random Features for Nonlinear Function Approximation
This work addresses the challenge of balancing flexibility and efficiency in function approximation for machine learning practitioners, though it appears incremental as it builds on existing random feature and neural network methods.
The authors tackled the problem of nonlinear function approximation by proposing semi-random features, which offer a middle ground between deep learning and kernel methods, and demonstrated that these features can match neural network performance with slightly more units and outperform random features with significantly fewer units.
We propose semi-random features for nonlinear function approximation. The flexibility of semi-random feature lies between the fully adjustable units in deep learning and the random features used in kernel methods. For one hidden layer models with semi-random features, we prove with no unrealistic assumptions that the model classes contain an arbitrarily good function as the width increases (universality), and despite non-convexity, we can find such a good function (optimization theory) that generalizes to unseen new data (generalization bound). For deep models, with no unrealistic assumptions, we prove universal approximation ability, a lower bound on approximation error, a partial optimization guarantee, and a generalization bound. Depending on the problems, the generalization bound of deep semi-random features can be exponentially better than the known bounds of deep ReLU nets; our generalization error bound can be independent of the depth, the number of trainable weights as well as the input dimensionality. In experiments, we show that semi-random features can match the performance of neural networks by using slightly more units, and it outperforms random features by using significantly fewer units. Moreover, we introduce a new implicit ensemble method by using semi-random features.