LGAPMLFeb 8, 2024

Unsupervised Discovery of Clinical Disease Signatures Using Probabilistic Independence

arXiv:2402.05802v15 citationsh-index: 26
Originality Incremental advance
AI Analysis

This work addresses the issue of treatment failures due to inaccurate diagnoses for clinicians and patients, representing a domain-specific advancement in medical AI.

The paper tackled the problem of imprecise clinical disease diagnosis by using unsupervised machine learning with probabilistic independence to discover latent disease signatures from electronic health records, resulting in 2000 signatures that improved lung cancer prediction accuracy and identified pre-nodule cancer signs in patients.

Insufficiently precise diagnosis of clinical disease is likely responsible for many treatment failures, even for common conditions and treatments. With a large enough dataset, it may be possible to use unsupervised machine learning to define clinical disease patterns more precisely. We present an approach to learning these patterns by using probabilistic independence to disentangle the imprint on the medical record of causal latent sources of disease. We inferred a broad set of 2000 clinical signatures of latent sources from 9195 variables in 269,099 Electronic Health Records. The learned signatures produced better discrimination than the original variables in a lung cancer prediction task unknown to the inference algorithm, predicting 3-year malignancy in patients with no history of cancer before a solitary lung nodule was discovered. More importantly, the signatures' greater explanatory power identified pre-nodule signatures of apparently undiagnosed cancer in many of those patients.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes