CLAICESep 19, 2023

CFGPT: Chinese Financial Assistant with Large Language Model

arXiv:2309.10654v222 citationsh-index: 25Has Code
Originality Synthesis-oriented
AI Analysis

This work addresses the need for specialized AI assistants in the Chinese financial domain, but it is incremental as it adapts existing methods to new data.

The authors tackled the problem of applying large language models to Chinese financial tasks by developing CFGPT, a framework including a dataset (CFData) with 584M documents and 141B tokens for pre-training and 1.5M instruction pairs for fine-tuning, a model (CFLLM) based on InternLM-7B, and a deployment tool (CFAPP), resulting in a system designed to handle financial texts and applications.

Large language models (LLMs) have demonstrated great potential in natural language processing tasks within the financial domain. In this work, we present a Chinese Financial Generative Pre-trained Transformer framework, named CFGPT, which includes a dataset~(CFData) for pre-training and supervised fine-tuning, a financial LLM~(CFLLM) to adeptly manage financial texts, and a deployment framework~(CFAPP) designed to navigate real-world financial applications. The CFData comprising both a pre-training dataset and a supervised fine-tuning dataset, where the pre-training dataset collates Chinese financial data and analytics, alongside a smaller subset of general-purpose text with 584M documents and 141B tokens in total, and the supervised fine-tuning dataset is tailored for six distinct financial tasks, embodying various facets of financial analysis and decision-making with 1.5M instruction pairs and 1.5B tokens in total. The CFLLM, which is based on InternLM-7B to balance the model capability and size, is trained on CFData in two stage, continued pre-training and supervised fine-tuning. The CFAPP is centered on large language models (LLMs) and augmented with additional modules to ensure multifaceted functionality in real-world application. Our codes are released at https://github.com/TongjiFinLab/CFGPT.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes