von Mises-Fisher Loss: An Exploration of Embedding Geometries for Supervised Learning
This work addresses the problem of selecting optimal embedding geometries for supervised learning, providing guidance for practitioners, but it is incremental as it builds on existing softmax-based approaches.
The paper systematically compares softmax losses with different embedding geometries (Euclidean, hyperbolic, spherical) for classification and retrieval tasks, finding that spherical losses lead to a proposed von Mises-Fisher classifier that is competitive with state-of-the-art methods and improves calibration.
Recent work has argued that classification losses utilizing softmax cross-entropy are superior not only for fixed-set classification tasks, but also by outperforming losses developed specifically for open-set tasks including few-shot learning and retrieval. Softmax classifiers have been studied using different embedding geometries -- Euclidean, hyperbolic, and spherical -- and claims have been made about the superiority of one or another, but they have not been systematically compared with careful controls. We conduct an empirical investigation of embedding geometry on softmax losses for a variety of fixed-set classification and image retrieval tasks. An interesting property observed for the spherical losses lead us to propose a probabilistic classifier based on the von Mises-Fisher distribution, and we show that it is competitive with state-of-the-art methods while producing improved out-of-the-box calibration. We provide guidance regarding the trade-offs between losses and how to choose among them.