CLApr 29, 2020

Evaluating Transformer-Based Multilingual Text Classification

Sophie Groenwold, Samhita Honnavalli, Lily Ou, Aesha Parekh, Sharon Levy, Diba Mirza, William Yang Wang

arXiv:2004.13939v21.37 citations

Originality Synthesis-oriented

AI Analysis

This work addresses the issue of performance disparities in NLP tools for languages with varying typological features, which is incremental as it builds on existing multilingual analysis.

The paper tackled the problem of unequal performance of NLP tools across languages with different typological structures by analyzing how word order and morphological typology affect language modeling efficacy, and conducted an experiment on eight languages and models for multi-class text classification.

As NLP tools become ubiquitous in today's technological landscape, they are increasingly applied to languages with a variety of typological structures. However, NLP research does not focus primarily on typological differences in its analysis of state-of-the-art language models. As a result, NLP tools perform unequally across languages with different syntactic and morphological structures. Through a detailed discussion of word order typology, morphological typology, and comparative linguistics, we identify which variables most affect language modeling efficacy; in addition, we calculate word order and morphological similarity indices to aid our empirical study. We then use this background to support our analysis of an experiment we conduct using multi-class text classification on eight languages and eight models.

View on arXiv PDF

Similar