Cedille: A large autoregressive French language model
This addresses the problem of limited zero-shot learning capabilities for non-English languages, specifically French, for NLP researchers and practitioners, though it is incremental as it adapts an existing paradigm to a new language.
The authors tackled the lack of large autoregressive language models for French by introducing Cedille, which outperforms existing French models and is competitive with GPT-3 on French zero-shot benchmarks, with concrete improvements in safety due to dataset filtering.
Scaling up the size and training of autoregressive language models has enabled novel ways of solving Natural Language Processing tasks using zero-shot and few-shot learning. While extreme-scale language models such as GPT-3 offer multilingual capabilities, zero-shot learning for languages other than English remain largely unexplored. Here, we introduce Cedille, a large open source auto-regressive language model, specifically trained for the French language. Our results show that Cedille outperforms existing French language models and is competitive with GPT-3 on a range of French zero-shot benchmarks. Furthermore, we provide an in-depth comparison of the toxicity exhibited by these models, showing that Cedille marks an improvement in language model safety thanks to dataset filtering.