CLJun 3, 2021

Language Embeddings for Typology and Cross-lingual Transfer Learning

arXiv:2106.02082v1714 citations
Originality Incremental advance
AI Analysis

This addresses the data scarcity issue in cross-lingual NLP, though it is incremental as it builds on existing embedding methods.

The authors tackled the problem of cross-lingual tasks requiring annotated or parallel data by learning language embeddings without such data, achieving results like 0.75 correlation with WALS and competitive performance in zero-shot parsing and NLI.

Cross-lingual language tasks typically require a substantial amount of annotated data or parallel translation data. We explore whether language representations that capture relationships among languages can be learned and subsequently leveraged in cross-lingual tasks without the use of parallel data. We generate dense embeddings for 29 languages using a denoising autoencoder, and evaluate the embeddings using the World Atlas of Language Structures (WALS) and two extrinsic tasks in a zero-shot setting: cross-lingual dependency parsing and cross-lingual natural language inference.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes