CLMar 11, 2016

Sieve-based Coreference Resolution in the Biomedical Domain

arXiv:1603.03758v221 citations
Originality Synthesis-oriented
AI Analysis

This work addresses coreference resolution specifically for biomedical texts, where existing methods fail due to lack of domain-specific cues, representing an incremental improvement tailored to this domain.

The paper tackles the problem of poor performance of domain-general coreference resolution algorithms on biomedical documents by introducing a sieve-based architecture that leverages domain knowledge, resulting in a 3.2% increase in throughput for the Reach event extraction system while maintaining precision.

We describe challenges and advantages unique to coreference resolution in the biomedical domain, and a sieve-based architecture that leverages domain knowledge for both entity and event coreference resolution. Domain-general coreference resolution algorithms perform poorly on biomedical documents, because the cues they rely on such as gender are largely absent in this domain, and because they do not encode domain-specific knowledge such as the number and type of participants required in chemical reactions. Moreover, it is difficult to directly encode this knowledge into most coreference resolution algorithms because they are not rule-based. Our rule-based architecture uses sequentially applied hand-designed "sieves", with the output of each sieve informing and constraining subsequent sieves. This architecture provides a 3.2% increase in throughput to our Reach event extraction system with precision parallel to that of the stricter system that relies solely on syntactic patterns for extraction.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes