Separating Retention from Extraction in the Evaluation of End-to-end Relation Extraction
This addresses generalization issues in NLP for researchers and practitioners, but it is incremental as it builds on prior work on heuristics.
The paper tackles the problem of shallow heuristics in end-to-end relation extraction models, showing that retention of training triples is a key performance factor on standard benchmarks, with one experiment indicating that pipeline models using intermediate type representations reduce over-reliance on retention.
State-of-the-art NLP models can adopt shallow heuristics that limit their generalization capability (McCoy et al., 2019). Such heuristics include lexical overlap with the training set in Named-Entity Recognition (Taillé et al., 2020) and Event or Type heuristics in Relation Extraction (Rosenman et al., 2020). In the more realistic end-to-end RE setting, we can expect yet another heuristic: the mere retention of training relation triples. In this paper, we propose several experiments confirming that retention of known facts is a key factor of performance on standard benchmarks. Furthermore, one experiment suggests that a pipeline model able to use intermediate type representations is less prone to over-rely on retention.