LGCVMLMay 23, 2018

Learning towards Minimum Hyperspherical Energy

arXiv:1805.09298v9169 citations
Originality Incremental advance
AI Analysis

This addresses the issue of representation redundancy in neural networks for machine learning practitioners, offering a novel regularization approach that is incremental in nature.

The paper tackles the problem of neuron redundancy in over-parametrized neural networks, which can hurt generalization and increase computation, by proposing a minimum hyperspherical energy (MHE) regularization method inspired by the Thomson problem, and demonstrates its effectiveness through extensive experiments on challenging tasks.

Neural networks are a powerful class of nonlinear functions that can be trained end-to-end on various applications. While the over-parametrization nature in many neural networks renders the ability to fit complex functions and the strong representation power to handle challenging tasks, it also leads to highly correlated neurons that can hurt the generalization ability and incur unnecessary computation cost. As a result, how to regularize the network to avoid undesired representation redundancy becomes an important issue. To this end, we draw inspiration from a well-known problem in physics -- Thomson problem, where one seeks to find a state that distributes N electrons on a unit sphere as evenly as possible with minimum potential energy. In light of this intuition, we reduce the redundancy regularization problem to generic energy minimization, and propose a minimum hyperspherical energy (MHE) objective as generic regularization for neural networks. We also propose a few novel variants of MHE, and provide some insights from a theoretical point of view. Finally, we apply neural networks with MHE regularization to several challenging tasks. Extensive experiments demonstrate the effectiveness of our intuition, by showing the superior performance with MHE regularization.

Code Implementations4 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes