Creation of an Annotated Corpus of Spanish Radiology Reports
This work addresses a domain-specific problem for researchers in biomedical NLP by providing a new dataset, but it is incremental as it applies existing annotation methods to new data.
The authors tackled the scarcity of biomedical annotated resources by creating a new annotated corpus of 513 anonymized Spanish radiology reports, manually labeled with entities, negation, uncertainty terms, and relations to serve as an evaluation resource for named entity recognition and relation extraction algorithms.
This paper presents a new annotated corpus of 513 anonymized radiology reports written in Spanish. Reports were manually annotated with entities, negation and uncertainty terms and relations. The corpus was conceived as an evaluation resource for named entity recognition and relation extraction algorithms, and as input for the use of supervised methods. Biomedical annotated resources are scarce due to confidentiality issues and associated costs. This work provides some guidelines that could help other researchers to undertake similar tasks.