INarIG: Iterative Non-autoregressive Instruct Generation Model For Word-Level Auto Completion
This improves translation efficiency in scenarios where machine translation quality is insufficient, though it is an incremental advance in a domain-specific task.
The paper tackles the Word-Level Auto Completion (WLAC) task for computer-aided translation by proposing the INarIG model, which achieves state-of-the-art results with a maximum increase of over 10% prediction accuracy on benchmark datasets.
Computer-aided translation (CAT) aims to enhance human translation efficiency and is still important in scenarios where machine translation cannot meet quality requirements. One fundamental task within this field is Word-Level Auto Completion (WLAC). WLAC predicts a target word given a source sentence, translation context, and a human typed character sequence. Previous works either employ word classification models to exploit contextual information from both sides of the target word or directly disregarded the dependencies from the right-side context. Furthermore, the key information, i.e. human typed sequences, is only used as prefix constraints in the decoding module. In this paper, we propose the INarIG (Iterative Non-autoregressive Instruct Generation) model, which constructs the human typed sequence into Instruction Unit and employs iterative decoding with subwords to fully utilize input information given in the task. Our model is more competent in dealing with low-frequency words (core scenario of this task), and achieves state-of-the-art results on the WMT22 and benchmark datasets, with a maximum increase of over 10% prediction accuracy.