LGMar 12

Adapting Methods for Domain-Specific Japanese Small LMs: Scale, Architecture, and Quantization

arXiv:2603.1803753.21 citationsh-index: 5
Predicted impact top 46% in LG · last 90 daysOriginality Incremental advance
AI Analysis

This provides actionable guidance for developing compact Japanese specialist language models on consumer hardware, addressing low-resource technical domains.

The paper tackles the problem of building domain-specific Japanese small language models by determining optimal training scale, base-model selection, and architecture-aware quantization, resulting in Swallow-8B Q4_K_M achieving a score of 2.830/3 with 8.9 seconds per question and 4.9 GB size.

This paper presents a systematic methodology for building domain-specific Japanese small language models using QLoRA fine-tuning. We address three core questions: optimal training scale, base-model selection, and architecture-aware quantization. Stage 1 (Training scale): Scale-learning experiments (1k--5k samples) identify n=4,000 as optimal, where test-set NLL reaches minimum (1.127) before overfitting at 5k samples. Stage 2 (Compare finetuned SLMs): Comparing four Japanese LLMs shows that Llama-3 models with Japanese continual pre-training (Swallow-8B, ELYZA-JP-8B) outperform multilingual models (Qwen2.5-7B). Stage 3 (Quantization): Llama-3 architectures improve under Q4_K_M quantization, while GQA architectures degrade severely (Qwen2.5: -0.280 points). Production recommendation: Swallow-8B Q4_K_M achieves 2.830/3 score, 8.9 s/question, 4.9 GB size. The methodology generalizes to low-resource technical domains and provides actionable guidance for compact Japanese specialist LMs on consumer hardware.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes