DistilCamemBERT: a distillation of the French model CamemBERT
This work addresses scalability and environmental concerns for industrial adoption of French language models, but is incremental as it applies known distillation techniques to an existing model.
The paper tackles the problem of large, computationally expensive French NLP models by presenting DistilCamemBERT, which drastically reduces computational cost while preserving good performance.
Modern Natural Language Processing (NLP) models based on Transformer structures represent the state of the art in terms of performance on very diverse tasks. However, these models are complex and represent several hundred million parameters for the smallest of them. This may hinder their adoption at the industrial level, making it difficult to scale up to a reasonable infrastructure and/or to comply with societal and environmental responsibilities. To this end, we present in this paper a model that drastically reduces the computational cost of a well-known French model (CamemBERT), while preserving good performance.