CLIRNov 6, 2020

From Dataset Recycling to Multi-Property Extraction and Beyond

arXiv:2011.03228v1995 citations
AI Analysis

This work addresses dataset limitations in information extraction for researchers, though it is incremental as it builds on existing datasets and tasks.

The paper tackles the problem of information extraction and machine reading comprehension by proposing a dual-source Transformer model that significantly outperforms the state-of-the-art on the WikiReading dataset, and introduces a new dataset, WikiReading Recycled, for multiple property extraction to address prior limitations.

This paper investigates various Transformer architectures on the WikiReading Information Extraction and Machine Reading Comprehension dataset. The proposed dual-source model outperforms the current state-of-the-art by a large margin. Next, we introduce WikiReading Recycled-a newly developed public dataset and the task of multiple property extraction. It uses the same data as WikiReading but does not inherit its predecessor's identified disadvantages. In addition, we provide a human-annotated test set with diagnostic subsets for a detailed analysis of model performance.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes