CLIRLGMLSep 1, 2018

Extractive Adversarial Networks: High-Recall Explanations for Identifying Personal Attacks in Social Media Posts

arXiv:1809.01499v21099 citations
Originality Incremental advance
AI Analysis

This work addresses the need for interpretable AI in content moderation for social media platforms, though it appears incremental as it builds on existing extractive explanation architectures.

The researchers tackled the problem of producing high-recall explanations for neural text classifiers by introducing an adversarial method that scans attention residuals, specifically applied to detecting personal attacks in social media posts, and demonstrated its effectiveness with a human-annotated validation set.

We introduce an adversarial method for producing high-recall explanations of neural text classifier decisions. Building on an existing architecture for extractive explanations via hard attention, we add an adversarial layer which scans the residual of the attention for remaining predictive signal. Motivated by the important domain of detecting personal attacks in social media comments, we additionally demonstrate the importance of manually setting a semantically appropriate `default' behavior for the model by explicitly manipulating its bias term. We develop a validation set of human-annotated personal attacks to evaluate the impact of these changes.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes