Ashutosh Tewari

0.5CLJul 23, 2023Code

Validation of a Zero-Shot Learning Natural Language Processing Tool for Data Abstraction from Unstructured Healthcare Data

Basil Kaufmann, Dallin Busby, Chandan Krushna Das et al.

Objectives: To describe the development and validation of a zero-shot learning natural language processing (NLP) tool for abstracting data from unstructured text contained within PDF documents, such as those found within electronic health records. Materials and Methods: A data abstraction tool based on the GPT-3.5 model from OpenAI was developed and compared to three physician human abstractors in terms of time to task completion and accuracy for abstracting data on 14 unique variables from a set of 199 de-identified radical prostatectomy pathology reports. The reports were processed by the software tool in vectorized and scanned formats to establish the impact of optical character recognition on data abstraction. The tool was assessed for superiority for data abstraction speed and non-inferiority for accuracy. Results: The human abstractors required a mean of 101s per report for data abstraction, with times varying from 15 to 284 s. In comparison, the software tool required a mean of 12.8 s to process the vectorized reports and a mean of 15.8 to process the scanned reports (P < 0.001). The overall accuracies of the three human abstractors were 94.7%, 97.8%, and 96.4% for the combined set of 2786 datapoints. The software tool had an overall accuracy of 94.2% for the vectorized reports, proving to be non-inferior to the human abstractors at a margin of -10% ($α$=0.025). The tool had a slightly lower accuracy of 88.7% using the scanned reports, proving to be non-inferiority to 2 out of 3 human abstractors. Conclusion: The developed zero-shot learning NLP tool affords researchers comparable levels of accuracy to that of human abstractors, with significant time savings benefits. Because of the lack of need for task-specific model training, the developed tool is highly generalizable and can be used for a wide variety of data abstraction tasks, even outside the field of medicine.

1.2APMay 2, 2022

Modeling and mitigation of occupational safety risks in dynamic industrial environments

Ashutosh Tewari, Antonio R. Paiva

Identifying and mitigating safety risks is paramount in a number of industries. In addition to guidelines and best practices, many industries already have safety management systems (SMSs) designed to monitor and reinforce good safety behaviors. The analytic capabilities to analyze the data acquired through such systems, however, are still lacking in terms of their ability to robustly quantify risks posed by various occupational hazards. Moreover, best practices and modern SMSs are unable to account for dynamically evolving environments/behavioral characteristics commonly found in many industrial settings. This article proposes a method to address these issues by enabling continuous and quantitative assessment of safety risks in a data-driven manner. The backbone of our method is an intuitive hierarchical probabilistic model that explains sparse and noisy safety data collected by a typical SMS. A fully Bayesian approach is developed to calibrate this model from safety data in an online fashion. Thereafter, the calibrated model holds necessary information that serves to characterize risk posed by different safety hazards. Additionally, the proposed model can be leveraged for automated decision making, for instance solving resource allocation problems -- targeted towards risk mitigation -- that are often encountered in resource-constrained industrial environments. The methodology is rigorously validated on a simulated test-bed and its scalability is demonstrated on real data from large maintenance projects at a petrochemical plant.

Ashutosh Tewari

2 Papers