Jacob Oppenheim

LG
3papers
4citations
Novelty38%
AI Score19

3 Papers

LGNov 16, 2022
Molecular Fingerprints for Robust and Efficient ML-Driven Molecular Generation

Ruslan N. Tazhigulov, Joshua Schiller, Jacob Oppenheim et al.

We propose a novel molecular fingerprint-based variational autoencoder applied for molecular generation on real-world drug molecules. We define more suitable and pharma-relevant baseline metrics and tests, focusing on the generation of diverse, drug-like, novel small molecules and scaffolds. When we apply these molecular generation metrics to our novel model, we observe a substantial improvement in chemical synthetic accessibility ($Δ\bar{SAS}$ = -0.83) and in computational efficiency up to 5.9x in comparison to an existing state-of-the-art SMILES-based architecture.

CLNov 14, 2022
Semantic Decomposition Improves Learning of Large Language Models on EHR Data

David A. Bloore, Romane Gauriau, Anna L. Decker et al.

Electronic health records (EHR) are widely believed to hold a profusion of actionable insights, encrypted in an irregular, semi-structured format, amidst a loud noise background. To simplify learning patterns of health and disease, medical codes in EHR can be decomposed into semantic units connected by hierarchical graphs. Building on earlier synergy between Bidirectional Encoder Representations from Transformers (BERT) and Graph Attention Networks (GAT), we present H-BERT, which ingests complete graph tree expansions of hierarchical medical codes as opposed to only ingesting the leaves and pushes patient-level labels down to each visit. This methodology significantly improves prediction of patient membership in over 500 medical diagnosis classes as measured by aggregated AUC and APS, and creates distinct representations of patients in closely related but clinically distinct phenotypes.

LGNov 14, 2022
Phenotype Detection in Real World Data via Online MixEHR Algorithm

Ying Xu, Romane Gauriau, Anna Decker et al.

Understanding patterns of diagnoses, medications, procedures, and laboratory tests from electronic health records (EHRs) and health insurer claims is important for understanding disease risk and for efficient clinical development, which often require rules-based curation in collaboration with clinicians. We extended an unsupervised phenotyping algorithm, mixEHR, to an online version allowing us to use it on order of magnitude larger datasets including a large, US-based claims dataset and a rich regional EHR dataset. In addition to recapitulating previously observed disease groups, we discovered clinically meaningful disease subtypes and comorbidities. This work scaled up an effective unsupervised learning method, reinforced existing clinical knowledge, and is a promising approach for efficient collaboration with clinicians.