CLAILGFeb 24, 2020

Predicting Subjective Features of Questions of QA Websites using BERT

arXiv:2002.10107v419 citations
AI Analysis

This work addresses the slow and inefficient manual moderation processes on platforms like StackOverflow and Quora, though it is incremental in applying existing methods to a new dataset.

The authors tackled the problem of automating content moderation in Q&A websites by predicting 20 subjective quality aspects of questions, achieving a Mean-Squared-Error of 0.046 with a fine-tuned BERT model after minimal training.

Community Question-Answering websites, such as StackOverflow and Quora, expect users to follow specific guidelines in order to maintain content quality. These systems mainly rely on community reports for assessing contents, which has serious problems such as the slow handling of violations, the loss of normal and experienced users' time, the low quality of some reports, and discouraging feedback to new users. Therefore, with the overall goal of providing solutions for automating moderation actions in Q&A websites, we aim to provide a model to predict 20 quality or subjective aspects of questions in QA websites. To this end, we used data gathered by the CrowdSource team at Google Research in 2019 and a fine-tuned pre-trained BERT model on our problem. Based on the evaluation by Mean-Squared-Error (MSE), the model achieved a value of 0.046 after 2 epochs of training, which did not improve substantially in the next ones. Results confirm that by simple fine-tuning, we can achieve accurate models in little time and on less amount of data.

Code Implementations5 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes