CLAILGMay 9, 2023

Effects of sub-word segmentation on performance of transformer language models

arXiv:2305.05480v3135 citations
Originality Incremental advance
AI Analysis

This addresses the efficiency and sustainability of language models for NLP practitioners, offering incremental improvements in training and inference costs.

The paper tackled the problem of how sub-word segmentation affects transformer language models, finding that using morphological segmentation algorithms like Morfessor and StateMorph leads to lower perplexity, faster convergence, and comparable or better downstream task performance, with smaller models matching larger ones trained with BPE.

Language modeling is a fundamental task in natural language processing, which has been thoroughly explored with various architectures and hyperparameters. However, few studies focus on the effect of sub-word segmentation on the performance of language models (LMs). In this paper, we compare GPT and BERT models trained with the statistical segmentation algorithm BPE vs. two unsupervised algorithms for morphological segmentation -- Morfessor and StateMorph. We train the models for several languages -- including ones with very rich morphology -- and compare their performance with different segmentation algorithms, vocabulary sizes, and model sizes. The results show that training with morphological segmentation allows the LMs to: 1. achieve lower perplexity, 2. converge more efficiently in terms of training time, and 3. achieve equivalent or better evaluation scores on downstream tasks. Lastly, we show 4. that LMs of smaller size using morphological segmentation can perform comparably to models of larger size trained with BPE -- both in terms of (1) perplexity and (3) scores on downstream tasks. Points (2) and (4) impact on sustainability of LMs, since they reduce the model cost: size and computation time. While (2) reduces cost only in the training phase, (4) does so also in the inference phase.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes