LGMar 12

Adapting Methods for Domain-Specific Japanese Small LMs: Scale, Architecture, and Quantization

arXiv:2603.1803753.21 citationsh-index: 5

Predicted impact top 46% in LG · last 90 daysOriginality Incremental advance

AI Analysis

This provides actionable guidance for developing compact Japanese specialist language models on consumer hardware, addressing low-resource technical domains.

The paper tackles the problem of building domain-specific Japanese small language models by determining optimal training scale, base-model selection, and architecture-aware quantization, resulting in Swallow-8B Q4_K_M achieving a score of 2.830/3 with 8.9 seconds per question and 4.9 GB size.

This paper presents a systematic methodology for building domain-specific Japanese small language models using QLoRA fine-tuning. We address three core questions: optimal training scale, base-model selection, and architecture-aware quantization. Stage 1 (Training scale): Scale-learning experiments (1k--5k samples) identify n=4,000 as optimal, where test-set NLL reaches minimum (1.127) before overfitting at 5k samples. Stage 2 (Compare finetuned SLMs): Comparing four Japanese LLMs shows that Llama-3 models with Japanese continual pre-training (Swallow-8B, ELYZA-JP-8B) outperform multilingual models (Qwen2.5-7B). Stage 3 (Quantization): Llama-3 architectures improve under Q4_K_M quantization, while GQA architectures degrade severely (Qwen2.5: -0.280 points). Production recommendation: Swallow-8B Q4_K_M achieves 2.830/3 score, 8.9 s/question, 4.9 GB size. The methodology generalizes to low-resource technical domains and provides actionable guidance for compact Japanese specialist LMs on consumer hardware.

View on arXiv PDF

Similar