CL CEApr 14, 2024

JaFIn: Japanese Financial Instruction Dataset

Kota Tanabe, Masahiro Suzuki, Hiroki Sakaji, Itsuki Noda

arXiv:2404.09260v23.44 citationsh-index: 14CIFEr

Originality Synthesis-oriented

AI Analysis

This work addresses domain adaptation for LLMs in Japanese finance, but it is incremental as it applies existing instruction tuning methods to a new dataset.

The authors tackled the problem of domain adaptation for large language models (LLMs) in Japanese finance by constructing JaFIn, a manually curated instruction dataset, and demonstrated that instruction tuning with it improved model performance over originals in quantitative and qualitative evaluations.

We construct an instruction dataset for the large language model (LLM) in the Japanese finance domain. Domain adaptation of language models, including LLMs, is receiving more attention as language models become more popular. This study demonstrates the effectiveness of domain adaptation through instruction tuning. To achieve this, we propose an instruction tuning data in Japanese called JaFIn, the Japanese Financial Instruction Dataset. JaFIn is manually constructed based on multiple data sources, including Japanese government websites, which provide extensive financial knowledge. We then utilize JaFIn to apply instruction tuning for several LLMs, demonstrating that our models specialized in finance have better domain adaptability than the original models. The financial-specialized LLMs created were evaluated using a quantitative Japanese financial benchmark and qualitative response comparisons, showing improved performance over the originals.

View on arXiv PDF

Similar