CLFeb 16, 2019

Exploring Language Similarities with Dimensionality Reduction Technique

arXiv:1902.06092v1
Originality Synthesis-oriented
AI Analysis

This work addresses the issue of limited language coverage in NLP for researchers and developers, but it is incremental as it applies existing techniques to visualize language similarities without introducing new methods.

The paper tackles the problem of underrepresentation of many languages in NLP models by exploring language similarities through dimensionality reduction, enabling visualization of lexical, syntactic, and semantic relationships in 2D plots to potentially aid in model development for other languages.

In recent years several novel models were developed to process natural language, development of accurate language translation systems have helped us overcome geographical barriers and communicate ideas effectively. These models are developed mostly for a few languages that are widely used while other languages are ignored. Most of the languages that are spoken share lexical, syntactic and sematic similarity with several other languages and knowing this can help us leverage the existing model to build more specific and accurate models that can be used for other languages, so here I have explored the idea of representing several known popular languages in a lower dimension such that their similarities can be visualized using simple 2 dimensional plots. This can even help us understand newly discovered languages that may not share its vocabulary with any of the existing languages.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes