Multi-Task Bidirectional Transformer Representations for Irony Detection
This work addresses irony detection in Arabic, a domain-specific task, with an incremental approach that improves performance for NLP researchers working with low-resource languages.
The authors tackled the problem of Arabic irony detection with limited training data by fine-tuning BERT in a multi-task setting and further pre-training it on in-domain data to address dialect mismatch, achieving an 82.4 macro F1 score.
Supervised deep learning requires large amounts of training data. In the context of the FIRE2019 Arabic irony detection shared task (IDAT@FIRE2019), we show how we mitigate this need by fine-tuning the pre-trained bidirectional encoders from transformers (BERT) on gold data in a multi-task setting. We further improve our models by by further pre-training BERT on `in-domain' data, thus alleviating an issue of dialect mismatch in the Google-released BERT model. Our best model acquires 82.4 macro F1 score, and has the unique advantage of being feature-engineering free (i.e., based exclusively on deep learning).