LGCLAug 6, 2016

Transferring Knowledge from Text to Predict Disease Onset

arXiv:1608.02071v11 citations
AI Analysis

This work addresses the challenge of building accurate predictive models in medicine with scarce data, though it is incremental as it adapts existing text-based methods to a specific domain.

The paper tackles the problem of limited training data in medical prediction by incorporating domain expertise from text to improve model accuracy, achieving a 60% reduction in selected features for easier interpretation.

In many domains such as medicine, training data is in short supply. In such cases, external knowledge is often helpful in building predictive models. We propose a novel method to incorporate publicly available domain expertise to build accurate models. Specifically, we use word2vec models trained on a domain-specific corpus to estimate the relevance of each feature's text description to the prediction problem. We use these relevance estimates to rescale the features, causing more important features to experience weaker regularization. We apply our method to predict the onset of five chronic diseases in the next five years in two genders and two age groups. Our rescaling approach improves the accuracy of the model, particularly when there are few positive examples. Furthermore, our method selects 60% fewer features, easing interpretation by physicians. Our method is applicable to other domains where feature and outcome descriptions are available.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes