Combining Spans into Entities: A Neural Two-Stage Approach for Recognizing Discontiguous Entities
This addresses the challenge of entity recognition in medical texts where entities can be discontiguous or overlapping, which is incremental as it builds on existing methods but improves performance.
The paper tackles the problem of recognizing discontiguous and overlapping entities in medical documents by proposing a neural two-stage approach that first detects overlapping spans and then combines them into entities, achieving state-of-the-art performance on a standard dataset without external features.
In medical documents, it is possible that an entity of interest not only contains a discontiguous sequence of words but also overlaps with another entity. Entities of such structures are intrinsically hard to recognize due to the large space of possible entity combinations. In this work, we propose a neural two-stage approach to recognize discontiguous and overlapping entities by decomposing this problem into two subtasks: 1) it first detects all the overlapping spans that either form entities on their own or present as segments of discontiguous entities, based on the representation of segmental hypergraph, 2) next it learns to combine these segments into discontiguous entities with a classifier, which filters out other incorrect combinations of segments. Two neural components are designed for these subtasks respectively and they are learned jointly using a shared encoder for text. Our model achieves the state-of-the-art performance in a standard dataset, even in the absence of external features that previous methods used.