CL AIJun 26, 2023

Data-Driven Approach for Formality-Sensitive Machine Translation: Language-Specific Handling and Synthetic Data Generation

Seugnjun Lee, Hyeonseok Moon, Chanjun Park, Heuiseok Lim

arXiv:2306.14514v20.5h-index: 13

Originality Incremental advance

AI Analysis

This work addresses formality in machine translation for specific languages, but it appears incremental as it builds on existing data-centric techniques.

The paper tackled the problem of formality-sensitive machine translation for four target languages by developing a data-driven approach with language-specific handling and synthetic data generation, resulting in a considerable improvement over the baseline.

In this paper, we introduce a data-driven approach for Formality-Sensitive Machine Translation (FSMT) that caters to the unique linguistic properties of four target languages. Our methodology centers on two core strategies: 1) language-specific data handling, and 2) synthetic data generation using large-scale language models and empirical prompt engineering. This approach demonstrates a considerable improvement over the baseline, highlighting the effectiveness of data-centric techniques. Our prompt engineering strategy further improves performance by producing superior synthetic translation examples.

View on arXiv PDF

Similar