CVNov 2, 2023
Hierarchical Classification System for Breast Cancer Specimen Report (HCSBC) -- an end-to-end model for characterizing severity and diagnosisThiago Santos, Harish Kamath, Christopher R. McAdams et al.
Automated classification of cancer pathology reports can extract information from unstructured reports and categorize each report into structured diagnosis and severity categories. Thus, such system can reduce the burden for populating tumor registries, help registration for clinical trial as well as developing large dataset for deep learning model development using true pathologic ground truth. However, the content of breast pathology reports can be difficult for categorize due to the high linguistic variability in content and wide variety of potential diagnoses >50. Existing NLP models are primarily focused on developing classifier for primary breast cancer types (e.g. IDC, DCIS, ILC) and tumor characteristics, and ignore the rare diagnosis of cancer subtypes. We then developed a hierarchical hybrid transformer-based pipeline (59 labels) - Hierarchical Classification System for Breast Cancer Specimen Report (HCSBC), which utilizes the potential of the transformer context-preserving NLP technique and compared our model to several state of the art ML and DL models. We trained the model on the EUH data and evaluated our model's performance on two external datasets - MGH and Mayo Clinic. We publicly release the code and a live application under Huggingface spaces repository
LGJan 1, 2024
Robust Meta-Model for Predicting the Need for Blood Transfusion in Non-traumatic ICU PatientsAlireza Rafiei, Ronald Moore, Tilendra Choudhary et al.
Objective: Blood transfusions, crucial in managing anemia and coagulopathy in ICU settings, require accurate prediction for effective resource allocation and patient risk assessment. However, existing clinical decision support systems have primarily targeted a particular patient demographic with unique medical conditions and focused on a single type of blood transfusion. This study aims to develop an advanced machine learning-based model to predict the probability of transfusion necessity over the next 24 hours for a diverse range of non-traumatic ICU patients. Methods: We conducted a retrospective cohort study on 72,072 adult non-traumatic ICU patients admitted to a high-volume US metropolitan academic hospital between 2016 and 2020. We developed a meta-learner and various machine learning models to serve as predictors, training them annually with four-year data and evaluating on the fifth, unseen year, iteratively over five years. Results: The experimental results revealed that the meta-model surpasses the other models in different development scenarios. It achieved notable performance metrics, including an Area Under the Receiver Operating Characteristic (AUROC) curve of 0.97, an accuracy rate of 0.93, and an F1-score of 0.89 in the best scenario. Conclusion: This study pioneers the use of machine learning models for predicting blood transfusion needs in a diverse cohort of critically ill patients. The findings of this evaluation confirm that our model not only predicts transfusion requirements effectively but also identifies key biomarkers for making transfusion decisions.
IVFeb 8, 2022
The EMory BrEast imaging Dataset (EMBED): A Racially Diverse, Granular Dataset of 3.5M Screening and Diagnostic MammogramsJiwoong J. Jeong, Brianna L. Vey, Ananth Reddy et al.
Developing and validating artificial intelligence models in medical imaging requires datasets that are large, granular, and diverse. To date, the majority of publicly available breast imaging datasets lack in one or more of these areas. Models trained on these data may therefore underperform on patient populations or pathologies that have not previously been encountered. The EMory BrEast imaging Dataset (EMBED) addresses these gaps by providing 3650,000 2D and DBT screening and diagnostic mammograms for 116,000 women divided equally between White and African American patients. The dataset also contains 40,000 annotated lesions linked to structured imaging descriptors and 61 ground truth pathologic outcomes grouped into six severity classes. Our goal is to share this dataset with research partners to aid in development and validation of breast AI models that will serve all patients fairly and help decrease bias in medical AI.