CLAIMLApr 4, 2018

Clinical Concept Embeddings Learned from Massive Sources of Multimodal Medical Data

arXiv:1804.01486v3203 citations
Originality Incremental advance
AI Analysis

This work addresses the need for high-quality medical concept embeddings to improve NLP applications in healthcare, though it is incremental as it builds on existing embedding methods with new data and a benchmark.

The authors tackled the problem of learning embeddings for medical concepts by combining a massive multimodal dataset including insurance claims, clinical notes, and biomedical articles, resulting in the largest set of embeddings for 108,477 concepts. They introduced cui2vec, which achieved state-of-the-art performance in most cases, and provided downloadable embeddings and an online tool for researchers.

Word embeddings are a popular approach to unsupervised learning of word relationships that are widely used in natural language processing. In this article, we present a new set of embeddings for medical concepts learned using an extremely large collection of multimodal medical data. Leaning on recent theoretical insights, we demonstrate how an insurance claims database of 60 million members, a collection of 20 million clinical notes, and 1.7 million full text biomedical journal articles can be combined to embed concepts into a common space, resulting in the largest ever set of embeddings for 108,477 medical concepts. To evaluate our approach, we present a new benchmark methodology based on statistical power specifically designed to test embeddings of medical concepts. Our approach, called cui2vec, attains state-of-the-art performance relative to previous methods in most instances. Finally, we provide a downloadable set of pre-trained embeddings for other researchers to use, as well as an online tool for interactive exploration of the cui2vec embeddings

Code Implementations4 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes