CLAICYAug 5, 2021

GENder-IT: An Annotated English-Italian Parallel Challenge Set for Cross-Linguistic Natural Gender Phenomena

arXiv:2108.02854v2713 citations
Originality Synthesis-oriented
AI Analysis

This provides a domain-specific resource for researchers and developers in machine translation, focusing on natural gender phenomena, but it is incremental as it creates a new dataset rather than advancing methods.

The paper tackles the lack of resources for resolving natural gender ambiguities in machine translation by introducing gENder-IT, an English-Italian parallel challenge set with word-level gender tags and alternative translations, resulting in a specific dataset to address cross-linguistic differences.

Languages differ in terms of the absence or presence of gender features, the number of gender classes and whether and where gender features are explicitly marked. These cross-linguistic differences can lead to ambiguities that are difficult to resolve, especially for sentence-level MT systems. The identification of ambiguity and its subsequent resolution is a challenging task for which currently there aren't any specific resources or challenge sets available. In this paper, we introduce gENder-IT, an English--Italian challenge set focusing on the resolution of natural gender phenomena by providing word-level gender tags on the English source side and multiple gender alternative translations, where needed, on the Italian target side.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes