CLAILGDec 21, 2024

Technical Report: Small Language Model for Japanese Clinical and Medicine

arXiv:2412.16423v1
Originality Incremental advance
AI Analysis

This addresses the need for efficient language models in Japanese clinical and medical applications, though it is incremental as it builds on existing SLM approaches.

The researchers developed a small language model (NCVC-slm-1) for Japanese clinical and medical text, which achieved the highest scores on 6 out of 8 tasks on the JMED-LLM benchmark after fine-tuning.

This report presents a small language model (SLM) for Japanese clinical and medicine, named NCVC-slm-1. This 1B parameters model was trained using Japanese text classified to be of high-quality. Moreover, NCVC-slm-1 was augmented with respect to clinical and medicine content that includes the variety of diseases, drugs, and examinations. Using a carefully designed pre-processing, a specialized morphological analyzer and tokenizer, this small and light-weight model performed not only to generate text but also indicated the feasibility of understanding clinical and medicine text. In comparison to other large language models, a fine-tuning NCVC-slm-1 demonstrated the highest scores on 6 tasks of total 8 on JMED-LLM. According to this result, SLM indicated the feasibility of performing several downstream tasks in the field of clinical and medicine. Hopefully, NCVC-slm-1 will be contributed to develop and accelerate the field of clinical and medicine for a bright future.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes