CLNov 24, 2021

A Self-Supervised Automatic Post-Editing Data Generation Tool

arXiv:2111.12284v2
Originality Synthesis-oriented
AI Analysis

This tool reduces human effort in data building for APE, facilitating data-centric research in understudied language pairs.

The paper tackles the problem of labor-intensive data creation for automatic post-editing by developing a self-supervised tool that generates personalized APE data from parallel corpora for multiple language pairs, enabling research in previously unexplored languages.

Data building for automatic post-editing (APE) requires extensive and expert-level human effort, as it contains an elaborate process that involves identifying errors in sentences and providing suitable revisions. Hence, we develop a self-supervised data generation tool, deployable as a web application, that minimizes human supervision and constructs personalized APE data from a parallel corpus for several language pairs with English as the target language. Data-centric APE research can be conducted using this tool, involving many language pairs that have not been studied thus far owing to the lack of suitable data.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes