CLAug 19, 2016

Using Distributed Representations to Disambiguate Biomedical and Clinical Concepts

arXiv:1608.05605v127 citationsHas Code
Originality Incremental advance
AI Analysis

This addresses the problem of ambiguous terms in biomedical and clinical text for researchers and practitioners, but it is incremental as it builds on existing word representation techniques.

The paper tackles word sense disambiguation in biomedical and clinical text by combining word representations from large corpora with UMLS definitions to create concept representations, achieving comparable performance to previous methods on the MSH-WSD dataset without using relational information.

In this paper, we report a knowledge-based method for Word Sense Disambiguation in the domains of biomedical and clinical text. We combine word representations created on large corpora with a small number of definitions from the UMLS to create concept representations, which we then compare to representations of the context of ambiguous terms. Using no relational information, we obtain comparable performance to previous approaches on the MSH-WSD dataset, which is a well-known dataset in the biomedical domain. Additionally, our method is fast and easy to set up and extend to other domains. Supplementary materials, including source code, can be found at https: //github.com/clips/yarn

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes