CLMay 8, 2018

Bleaching Text: Abstract Features for Cross-lingual Gender Prediction

arXiv:1805.03122v11100 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of making gender prediction systems more generalizable across languages and topics, though it is incremental as it builds on existing cross-lingual embedding methods.

The paper tackled the problem of language- and topic-dependent gender prediction by proposing to transform lexical strings into abstract features, which improved cross-lingual transfer compared to lexical models, with human performance found to be similar to the bleached models.

Gender prediction has typically focused on lexical and social network features, yielding good performance, but making systems highly language-, topic-, and platform-dependent. Cross-lingual embeddings circumvent some of these limitations, but capture gender-specific style less. We propose an alternative: bleaching text, i.e., transforming lexical strings into more abstract features. This study provides evidence that such features allow for better transfer across languages. Moreover, we present a first study on the ability of humans to perform cross-lingual gender prediction. We find that human predictive power proves similar to that of our bleached models, and both perform better than lexical models.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes