CLApr 27, 2023

SweCTRL-Mini: a data-transparent Transformer-based large language model for controllable text generation in Swedish

arXiv:2304.13994v34 citationsh-index: 20
Originality Synthesis-oriented
AI Analysis

This work addresses the need for accessible and controllable language models for Swedish text generation, though it is incremental as it adapts an existing architecture to a specific language.

The authors tackled the problem of controllable text generation in Swedish by developing SweCTRL-Mini, a Transformer-based large language model that allows genre control via special tokens and is optimized for single GPU use, achieving competitive performance compared to GPT-3 in evaluations.

We present SweCTRL-Mini, a large Swedish language model that can be used for inference and fine-tuning on a single consumer-grade GPU. The model is based on the CTRL architecture by Keskar, McCann, Varshney, Xiong, and Socher (2019), which means that users of the SweCTRL-Mini model can control the genre of the generated text by inserting special tokens in the generation prompts. SweCTRL-Mini is trained on a subset of the Swedish part of the mC4 corpus and a set of Swedish novels. In this article, we provide (1) a detailed account of the utilized training data and text pre-processing steps, to the extent that it is possible to check whether a specific phrase/source was a part of the training data, and (2) an evaluation of the model on both discriminative tasks, using automatic evaluation methods, and generative tasks, using human referees. We also compare the generative capabilities of the model with those of GPT-3. SweCTRL-Mini is fully open and available for download.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes