CL LGOct 5, 2025

Fine Tuning Methods for Low-resource Languages

Tim Bakkenes, Daniel Wang, Anton Johansson

arXiv:2510.04139v12.7

Originality Synthesis-oriented

AI Analysis

This work addresses the inclusivity gap in AI for underrepresented language communities, though it is incremental as it applies existing fine-tuning techniques to new cultural contexts.

The paper tackled the underperformance of large language models in underrepresented languages by developing a method for culturally relevant dataset preparation and post-training Gemma 2, resulting in increased performance for a low-resource language to help preserve cultural heritage.

The rise of Large Language Models has not been inclusive of all cultures. The models are mostly trained on English texts and culture which makes them underperform in other languages and cultural contexts. By developing a generalizable method for preparing culturally relevant datasets and post-training the Gemma 2 model, this project aimed to increase the performance of Gemma 2 for an underrepresented language and showcase how others can do the same to unlock the power of Generative AI in their country and preserve their cultural heritage.

View on arXiv PDF

Similar