CL LGJan 11, 2021

Automating the Compilation of Potential Core-Outcomes for Clinical Trials

arXiv:2101.04076v11 citations

Originality Incremental advance

AI Analysis

This work aims to alleviate the difficulty for researchers in parsing disparate clinical trial results by automating the identification of core outcomes, representing an incremental improvement in data standardization for clinical trial analysis.

This paper addresses the lack of standardization in clinical trial outcome reporting by developing an automated method using natural language processing to identify probable core outcomes. The method, employing BioBERT and an unsupervised feature-based approach with cosine similarity, established a pipeline for automation, identifying common outcomes despite some being untenable.

Due to increased access to clinical trial outcomes and analysis, researchers and scientists are able to iterate or improve upon relevant approaches more effectively. However, the metrics and related results of clinical trials typically do not follow any standardization in their reports, making it more difficult for researchers to parse the results of different trials. The objective of this paper is to describe an automated method utilizing natural language processing in order to describe the probable core outcomes of clinical trials, in order to alleviate the issues around disparate clinical trial outcomes. As the nature of this process is domain specific, BioBERT was employed in order to conduct a multi-class entity normalization task. In addition to BioBERT, an unsupervised feature-based approach making use of only the encoder output embedding representations for the outcomes and labels was utilized. Finally, cosine similarity was calculated across the vectors to obtain the semantic similarity. This method was able to both harness the domain-specific context of each of the tokens from the learned embeddings of the BioBERT model as well as a more stable metric of sentence similarity. Some common outcomes identified using the Jaccard similarity in each of the classifications were compiled, and while some are untenable, a pipeline for which this automation process could be conducted was established.

View on arXiv PDF

Similar