CL AI CEJun 27, 2024

FFN: a Fine-grained Chinese-English Financial Domain Parallel Corpus

Yuxin Fu, Shijing Si, Leyi Mai, Xi-ang Li

arXiv:2406.18856v11.0Has Code

Originality Synthesis-oriented

AI Analysis

This addresses the need for accurate financial translation by providing a dataset and analysis for optimizing LLMs, though it is incremental as it focuses on benchmarking rather than proposing new methods.

The authors constructed a fine-grained Chinese-English parallel corpus of financial news (FFN) with 1,013 manually corrected main texts and 809 titles, and used it to evaluate the translation quality of ChatGPT and ERNIE-bot against an OpenNMT model, identifying problems with LLMs in this domain.

Large Language Models (LLMs) have stunningly advanced the field of machine translation, though their effectiveness within the financial domain remains largely underexplored. To probe this issue, we constructed a fine-grained Chinese-English parallel corpus of financial news called FFN. We acquired financial news articles spanning between January 1st, 2014, to December 31, 2023, from mainstream media websites such as CNN, FOX, and China Daily. The dataset consists of 1,013 main text and 809 titles, all of which have been manually corrected. We measured the translation quality of two LLMs -- ChatGPT and ERNIE-bot, utilizing BLEU, TER and chrF scores as the evaluation metrics. For comparison, we also trained an OpenNMT model based on our dataset. We detail problems of LLMs and provide in-depth analysis, intending to stimulate further research and solutions in this largely uncharted territory. Our research underlines the need to optimize LLMs within the specific field of financial translation to ensure accuracy and quality.

View on arXiv PDF Code

Similar