CL IRNov 6, 2020

From Dataset Recycling to Multi-Property Extraction and Beyond

Tomasz Dwojak, Michał Pietruszka, Łukasz Borchmann, Jakub Chłędowski, Filip Graliński

arXiv:2011.03228v131.1995 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses dataset limitations in information extraction for researchers, though it is incremental as it builds on existing datasets and tasks.

The paper tackles the problem of information extraction and machine reading comprehension by proposing a dual-source Transformer model that significantly outperforms the state-of-the-art on the WikiReading dataset, and introduces a new dataset, WikiReading Recycled, for multiple property extraction to address prior limitations.

This paper investigates various Transformer architectures on the WikiReading Information Extraction and Machine Reading Comprehension dataset. The proposed dual-source model outperforms the current state-of-the-art by a large margin. Next, we introduce WikiReading Recycled-a newly developed public dataset and the task of multiple property extraction. It uses the same data as WikiReading but does not inherit its predecessor's identified disadvantages. In addition, we provide a human-annotated test set with diagnostic subsets for a detailed analysis of model performance.

View on arXiv PDF Code

Similar