CLSep 9, 2019

BERT-Based Arabic Social Media Author Profiling

arXiv:1909.04181v30.2

Originality Synthesis-oriented

AI Analysis

This work addresses the problem of automated author profiling for Arabic social media users, but it is incremental as it applies existing BERT methods with data augmentation.

The paper tackled author profiling for Arabic social media by fine-tuning BERT models to detect age, language variety, and gender, achieving accuracies of 54.72% for age, 93.75% for dialect, and 81.67% for gender.

We report our models for detecting age, language variety, and gender from social media data in the context of the Arabic author profiling and deception detection shared task (APDA). We build simple models based on pre-trained bidirectional encoders from transformers (BERT). We first fine-tune the pre-trained BERT model on each of the three datasets with shared task released data. Then we augment shared task data with in-house data for gender and dialect, showing the utility of augmenting training data. Our best models on the shared task test data are acquired with a majority voting of various BERT models trained under different data conditions. We acquire 54.72% accuracy for age, 93.75% for dialect, 81.67% for gender, and 40.97% joint accuracy across the three tasks.

View on arXiv PDF

Similar