Including Dialects and Language Varieties in Author Profiling
This work addresses the problem of improving author profiling accuracy for researchers and practitioners in natural language processing, but it is incremental as it applies existing methods to a specific dataset.
The paper tackled author profiling by incorporating gender and language variety, achieving 75% average accuracy in gender identification across four languages and 97% accuracy in language variety identification for Portuguese.
This paper presents a computational approach to author profiling taking gender and language variety into account. We apply an ensemble system with the output of multiple linear SVM classifiers trained on character and word $n$-grams. We evaluate the system using the dataset provided by the organizers of the 2017 PAN lab on author profiling. Our approach achieved 75% average accuracy on gender identification on tweets written in four languages and 97% accuracy on language variety identification for Portuguese.