Self-organized Hierarchical Softmax
This addresses the computational bottleneck in neural language models for NLP applications, offering an incremental improvement by automating hierarchical structure learning.
The paper tackles the problem of efficient language modeling over large vocabularies by proposing a self-organizing hierarchical softmax that learns word clusters with syntactical and semantic meaning during training, achieving comparable or better performance to full softmax models while maintaining speed similar to other efficient approximations.
We propose a new self-organizing hierarchical softmax formulation for neural-network-based language models over large vocabularies. Instead of using a predefined hierarchical structure, our approach is capable of learning word clusters with clear syntactical and semantic meaning during the language model training process. We provide experiments on standard benchmarks for language modeling and sentence compression tasks. We find that this approach is as fast as other efficient softmax approximations, while achieving comparable or even better performance relative to similar full softmax models.