CLDec 4, 2020

Event Guided Denoising for Multilingual Relation Learning

arXiv:2012.02721v1
AI Analysis

This work provides a more data-efficient method for training multilingual relation extraction models, which is significant for researchers and practitioners in NLP who face high data collection costs.

This paper addresses the high data cost of distant supervision for relation extraction by proposing a method to collect high-quality training data from unlabeled text. Their approach achieves comparable zero-shot and few-shot results to a state-of-the-art method using significantly fewer examples (50k vs. 300 million+).

General purpose relation extraction has recently seen considerable gains in part due to a massively data-intensive distant supervision technique from Soares et al. (2019) that produces state-of-the-art results across many benchmarks. In this work, we present a methodology for collecting high quality training data for relation extraction from unlabeled text that achieves a near-recreation of their zero-shot and few-shot results at a fraction of the training cost. Our approach exploits the predictable distributional structure of date-marked news articles to build a denoised corpus -- the extraction process filters out low quality examples. We show that a smaller multilingual encoder trained on this corpus performs comparably to the current state-of-the-art (when both receive little to no fine-tuning) on few-shot and standard relation benchmarks in English and Spanish despite using many fewer examples (50k vs. 300mil+).

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes