Dalton Schutte

CL
3papers
173citations
Novelty28%
AI Score20

3 Papers

CLOct 20, 2021
An Open Natural Language Processing Development Framework for EHR-based Clinical Research: A case demonstration using the National COVID Cohort Collaborative (N3C)

Sijia Liu, Andrew Wen, Liwei Wang et al.

While we pay attention to the latest advances in clinical natural language processing (NLP), we can notice some resistance in the clinical and translational research community to adopt NLP models due to limited transparency, interpretability, and usability. In this study, we proposed an open natural language processing development framework. We evaluated it through the implementation of NLP algorithms for the National COVID Cohort Collaborative (N3C). Based on the interests in information extraction from COVID-19 related clinical notes, our work includes 1) an open data annotation process using COVID-19 signs and symptoms as the use case, 2) a community-driven ruleset composing platform, and 3) a synthetic text data generation workflow to generate texts for information extraction tasks without involving human subjects. The corpora were derived from texts from three different institutions (Mayo Clinic, University of Kentucky, University of Minnesota). The gold standard annotations were tested with a single institution's (Mayo) ruleset. This resulted in performances of 0.876, 0.706, and 0.694 in F-scores for Mayo, Minnesota, and Kentucky test datasets, respectively. The study as a consortium effort of the N3C NLP subgroup demonstrates the feasibility of creating a federated NLP algorithm development and benchmarking platform to enhance multi-institution clinical NLP study and adoption. Although we use COVID-19 as a use case in this effort, our framework is general enough to be applied to other domains of interest in clinical NLP.

IRJun 24, 2021
Discovering novel drug-supplement interactions using a dietary supplements knowledge graph generated from the biomedical literature

Dalton Schutte, Jake Vasilakes, Anu Bompelli et al.

OBJECTIVE: Leverage existing biomedical NLP tools and DS domain terminology to produce a novel and comprehensive knowledge graph containing dietary supplement (DS) information for discovering interactions between DS and drugs, or Drug-Supplement Interactions (DSI). MATERIALS AND METHODS: We created SemRepDS (an extension of SemRep), capable of extracting semantic relations from abstracts by leveraging a DS-specific terminology (iDISK) containing 28,884 DS terms not found in the UMLS. PubMed abstracts were processed using SemRepDS to generate semantic relations, which were then filtered using a PubMedBERT-based model to remove incorrect relations before generating our knowledge graph (SuppKG). Two pathways are used to identify potential DS-Drug interactions which are then evaluated by medical professionals for mechanistic plausibility. RESULTS: Comparison analysis found that SemRepDS returned 206.9% more DS relations and 158.5% more DS entities than SemRep. The fine-tuned BERT model obtained an F1 score of 0.8605 and removed 43.86% of the relations, improving the precision of the relations by 26.4% compared to pre-filtering. SuppKG consists of 2,928 DS-specific nodes. Manual review of findings identified 44 (88%) proposed DS-Gene-Drug and 32 (64%) proposed DS-Gene1-Function-Gene2-Drug pathways to be mechanistically plausible. DISCUSSION: The additional relations extracted using SemRepDS generated SuppKG that was used to find plausible DSI not found in the current literature. By the nature of the SuppKG, these interactions are unlikely to have been found using SemRep without the expanded DS terminology. CONCLUSION: We successfully extend SemRep to include DS information and produce SuppKG which can be used to find potential DS-Drug interactions.

CLOct 19, 2020
Drug Repurposing for COVID-19 via Knowledge Graph Completion

Rui Zhang, Dimitar Hristovski, Dalton Schutte et al.

Objective: To discover candidate drugs to repurpose for COVID-19 using literature-derived knowledge and knowledge graph completion methods. Methods: We propose a novel, integrative, and neural network-based literature-based discovery (LBD) approach to identify drug candidates from both PubMed and COVID-19-focused research literature. Our approach relies on semantic triples extracted using SemRep (via SemMedDB). We identified an informative subset of semantic triples using filtering rules and an accuracy classifier developed on a BERT variant, and used this subset to construct a knowledge graph. Five SOTA, neural knowledge graph completion algorithms were used to predict drug repurposing candidates. The models were trained and assessed using a time slicing approach and the predicted drugs were compared with a list of drugs reported in the literature and evaluated in clinical trials. These models were complemented by a discovery pattern-based approach. Results: Accuracy classifier based on PubMedBERT achieved the best performance (F1= 0.854) in classifying semantic predications. Among five knowledge graph completion models, TransE outperformed others (MR = 0.923, Hits@1=0.417). Some known drugs linked to COVID-19 in the literature were identified, as well as some candidate drugs that have not yet been studied. Discovery patterns enabled generation of plausible hypotheses regarding the relationships between the candidate drugs and COVID-19. Among them, five highly ranked and novel drugs (paclitaxel, SB 203580, alpha 2-antiplasmin, pyrrolidine dithiocarbamate, and butylated hydroxytoluene) with their mechanistic explanations were further discussed. Conclusion: We show that an LBD approach can be feasible for discovering drug candidates for COVID-19, and for generating mechanistic explanations. Our approach can be generalized to other diseases as well as to other clinical questions.