CLJul 3, 2017

Including Dialects and Language Varieties in Author Profiling

arXiv:1707.00621v116 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of improving author profiling accuracy for researchers and practitioners in natural language processing, but it is incremental as it applies existing methods to a specific dataset.

The paper tackled author profiling by incorporating gender and language variety, achieving 75% average accuracy in gender identification across four languages and 97% accuracy in language variety identification for Portuguese.

This paper presents a computational approach to author profiling taking gender and language variety into account. We apply an ensemble system with the output of multiple linear SVM classifiers trained on character and word $n$-grams. We evaluate the system using the dataset provided by the organizers of the 2017 PAN lab on author profiling. Our approach achieved 75% average accuracy on gender identification on tweets written in four languages and 97% accuracy on language variety identification for Portuguese.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes