Expanding Foundational Language Capabilities in Open-Source LLMs through a Korean Case Study
This addresses the need for better Korean language support in AI models, but it is incremental as it builds on existing architectures and focuses on a specific domain.
The paper tackled the problem of enhancing Korean language capabilities in open-source LLMs while maintaining English performance, resulting in Llama-3-Motif achieving results comparable to GPT-4 on Korean benchmarks.
We introduce Llama-3-Motif, a language model consisting of 102 billion parameters, specifically designed to enhance Korean capabilities while retaining strong performance in English. Developed on the Llama 3 architecture, Llama-3-Motif employs advanced training techniques, including LlamaPro and Masked Structure Growth, to effectively scale the model without altering its core Transformer architecture. Using the MoAI platform for efficient training across hyperscale GPU clusters, we optimized Llama-3-Motif using a carefully curated dataset that maintains a balanced ratio of Korean and English data. Llama-3-Motif shows decent performance on Korean-specific benchmarks, outperforming existing models and achieving results comparable to GPT-4.