CLJan 27, 2021

Multilingual and cross-lingual document classification: A meta-learning approach

arXiv:2101.11302v2805 citations
Originality Incremental advance
AI Analysis

This addresses the problem of limited data for deep learning in most world languages, though it is incremental as it adjusts existing methods.

The paper tackles document classification for under-resourced languages by proposing a meta-learning approach, achieving new state-of-the-art results on several languages with only a small amount of labeled data.

The great majority of languages in the world are considered under-resourced for the successful application of deep learning methods. In this work, we propose a meta-learning approach to document classification in limited-resource setting and demonstrate its effectiveness in two different settings: few-shot, cross-lingual adaptation to previously unseen languages; and multilingual joint training when limited target-language data is available during training. We conduct a systematic comparison of several meta-learning methods, investigate multiple settings in terms of data availability and show that meta-learning thrives in settings with a heterogeneous task distribution. We propose a simple, yet effective adjustment to existing meta-learning methods which allows for better and more stable learning, and set a new state of the art on several languages while performing on-par on others, using only a small amount of labeled data.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes