CL AI LGSep 6, 2021

Data Science Kitchen at GermEval 2021: A Fine Selection of Hand-Picked Features, Delivered Fresh from the Oven

Niclas Hildebrandt, Benedikt Boenninghoff, Dennis Orth, Christopher Schymura

arXiv:2109.02383v230.7662 citationsHas Code

Originality Synthesis-oriented

AI Analysis

This work addresses the problem of moderating online comments for fact-checking prioritization, but it is incremental as it applies existing methods to a new shared task.

The paper tackled the identification of toxic, engaging, and fact-claiming comments in German online content by using a feature-engineering approach with classifier ensembles, achieving macro-averaged F1-scores of 66.8%, 69.9%, and 72.5% respectively.

This paper presents the contribution of the Data Science Kitchen at GermEval 2021 shared task on the identification of toxic, engaging, and fact-claiming comments. The task aims at extending the identification of offensive language, by including additional subtasks that identify comments which should be prioritized for fact-checking by moderators and community managers. Our contribution focuses on a feature-engineering approach with a conventional classification backend. We combine semantic and writing style embeddings derived from pre-trained deep neural networks with additional numerical features, specifically designed for this task. Classifier ensembles are used to derive predictions for each subtask via a majority voting scheme. Our best submission achieved macro-averaged F1-scores of 66.8\%,\,69.9\% and 72.5\% for the identification of toxic, engaging, and fact-claiming comments.

View on arXiv PDF Code

Similar