CLLGMLJun 17, 2019

Exploiting Unsupervised Pre-training and Automated Feature Engineering for Low-resource Hate Speech Detection in Polish

arXiv:1906.09325v14 citations
Originality Synthesis-oriented
AI Analysis

This work addresses hate speech detection in a low-resource language (Polish), but it is incremental as it applies existing methods to a new dataset.

The authors tackled hate speech detection in Polish by fine-tuning ULMFiT and BERT models and using TPOT for automated feature engineering, achieving second place in a subtask with a logistic regression model.

This paper presents our contribution to PolEval 2019 Task 6: Hate speech and bullying detection. We describe three parallel approaches that we followed: fine-tuning a pre-trained ULMFiT model to our classification task, fine-tuning a pre-trained BERT model to our classification task, and using the TPOT library to find the optimal pipeline. We present results achieved by these three tools and review their advantages and disadvantages in terms of user experience. Our team placed second in subtask 2 with a shallow model found by TPOT: a~logistic regression classifier with non-trivial feature engineering.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes