TTM-RE: Memory-Augmented Document-Level Relation Extraction
This work solves the problem of improving relation extraction accuracy for researchers and practitioners in NLP by effectively leveraging noisy data, though it is incremental as it builds on existing memory-augmented and noisy-robust techniques.
The paper tackles the problem of document-level relation extraction by addressing the inefficiency of previous methods in utilizing large-scale noisy training data, proposing TTM-RE which integrates a trainable memory module and a noisy-robust loss function, achieving state-of-the-art performance with an absolute F1 score improvement of over 3% on the ReDocRED benchmark.
Document-level relation extraction aims to categorize the association between any two entities within a document. We find that previous methods for document-level relation extraction are ineffective in exploiting the full potential of large amounts of training data with varied noise levels. For example, in the ReDocRED benchmark dataset, state-of-the-art methods trained on the large-scale, lower-quality, distantly supervised training data generally do not perform better than those trained solely on the smaller, high-quality, human-annotated training data. To unlock the full potential of large-scale noisy training data for document-level relation extraction, we propose TTM-RE, a novel approach that integrates a trainable memory module, known as the Token Turing Machine, with a noisy-robust loss function that accounts for the positive-unlabeled setting. Extensive experiments on ReDocRED, a benchmark dataset for document-level relation extraction, reveal that TTM-RE achieves state-of-the-art performance (with an absolute F1 score improvement of over 3%). Ablation studies further illustrate the superiority of TTM-RE in other domains (the ChemDisGene dataset in the biomedical domain) and under highly unlabeled settings.