Fast Label Embeddings via Randomized Linear Algebra
This addresses efficiency issues in machine learning applications with many labels, such as text classification, though it appears incremental as it builds on existing label embedding techniques.
The paper tackles the computational challenge of large output spaces in multiclass and multilabel problems by developing a fast randomized algorithm for label embeddings, achieving exponential speedup over naive methods and state-of-the-art results on two large-scale datasets.
Many modern multiclass and multilabel problems are characterized by increasingly large output spaces. For these problems, label embeddings have been shown to be a useful primitive that can improve computational and statistical efficiency. In this work we utilize a correspondence between rank constrained estimation and low dimensional label embeddings that uncovers a fast label embedding algorithm which works in both the multiclass and multilabel settings. The result is a randomized algorithm whose running time is exponentially faster than naive algorithms. We demonstrate our techniques on two large-scale public datasets, from the Large Scale Hierarchical Text Challenge and the Open Directory Project, where we obtain state of the art results.