Inferring disease correlation from healthcare data
This work addresses the problem of knowledge extraction from healthcare data for clinicians, but it is incremental as it builds on existing methods like NLP and concept mapping.
The study tackled the challenge of extracting disease correlations from unstructured Electronic Health Records by analyzing discharge summaries of obesity patients, resulting in the identification of binary disease relations for co-morbidity and risk factors validated against biomedical literature and gene interaction networks.
Electronic Health Records maintained in health care settings are a potential source of substantial clinical knowledge. The massive volume of data, unstructured nature of records and obligatory requirement of domain acquaintance together pose a challenge in knowledge extraction from it. The aim of this study is to overcome this challenge with a methodical analysis, abstraction and summarization of such data. This is an attempt to explain clinical observations through bio-medical and genomic data. Discharge summaries of obesity patients were processed to extract coherent patterns. This was supported by Machine Learning and Natural Language Processing based technologies and concept mapping tool along with biomedical, clinical and genomic knowledge bases. Semantic relations between diseases were extracted and filtered through Chi square test to remove spurious relations. The remaining relations were validated against biomedical literature and gene interaction networks. A collection of binary relations of diseases was derived from the data. One set implied co-morbidity while the other set contained diseases which are risk factors of others. Validation against bio-medical literature increased the prospect of correlation between diseases. Gene interaction network revealed that the diseases are related and their corresponding genes are in close proximity. Conclusion: This study focuses on deducing meaningful relations between diseases from discharge summaries. For analytical purpose, the scope has been limited to a few common, well-researched diseases. It can be extended to incorporate relatively unknown, complex diseases and discover new traits to help in clinical assessments.