Entropy2Vec: Crosslingual Language Modeling Entropy as End-to-End Learnable Language Representations
This addresses the issue of feature sparsity and static snapshots in traditional typological inventories for multilingual NLP, though it appears incremental as it builds on existing language modeling and typological concepts.
The authors tackled the problem of deriving cross-lingual language representations by proposing Entropy2Vec, which uses the entropy of monolingual language models to capture typological relationships, resulting in dense embeddings that align with established categories and achieve competitive performance in multilingual NLP tasks.
We introduce Entropy2Vec, a novel framework for deriving cross-lingual language representations by leveraging the entropy of monolingual language models. Unlike traditional typological inventories that suffer from feature sparsity and static snapshots, Entropy2Vec uses the inherent uncertainty in language models to capture typological relationships between languages. By training a language model on a single language, we hypothesize that the entropy of its predictions reflects its structural similarity to other languages: Low entropy indicates high similarity, while high entropy suggests greater divergence. This approach yields dense, non-sparse language embeddings that are adaptable to different timeframes and free from missing values. Empirical evaluations demonstrate that Entropy2Vec embeddings align with established typological categories and achieved competitive performance in downstream multilingual NLP tasks, such as those addressed by the LinguAlchemy framework.