Emily Lin

h-index5
2papers

2 Papers

CLFeb 3, 2024
Data Quality Matters: Suicide Intention Detection on Social Media Posts Using RoBERTa-CNN

Emily Lin, Jian Sun, Hsingyu Chen et al.

Suicide remains a pressing global health concern, necessitating innovative approaches for early detection and intervention. This paper focuses on identifying suicidal intentions in posts from the SuicideWatch subreddit by proposing a novel deep-learning approach that utilizes the state-of-the-art RoBERTa-CNN model. The robustly Optimized BERT Pretraining Approach (RoBERTa) excels at capturing textual nuances and forming semantic relationships within the text. The remaining Convolutional Neural Network (CNN) head enhances RoBERTa's capacity to discern critical patterns from extensive datasets. To evaluate RoBERTa-CNN, we conducted experiments on the Suicide and Depression Detection dataset, yielding promising results. For instance, RoBERTa-CNN achieves a mean accuracy of 98% with a standard deviation (STD) of 0.0009. Additionally, we found that data quality significantly impacts the training of a robust model. To improve data quality, we removed noise from the text data while preserving its contextual content through either manually cleaning or utilizing the OpenAI API.

IVMay 3, 2021
Semi-supervised learning for generalizable intracranial hemorrhage detection and segmentation

Emily Lin, Esther Yuh

Purpose: To develop and evaluate a semi-supervised learning model for intracranial hemorrhage detection and segmentation on an out-of-distribution head CT evaluation set. Materials and Methods: This retrospective study used semi-supervised learning to bootstrap performance. An initial "teacher" deep learning model was trained on 457 pixel-labeled head CT scans collected from one US institution from 2010-2017 and used to generate pseudo-labels on a separate unlabeled corpus of 25000 examinations from the RSNA and ASNR. A second "student" model was trained on this combined pixel- and pseudo-labeled dataset. Hyperparameter tuning was performed on a validation set of 93 scans. Testing for both classification (n=481 examinations) and segmentation (n=23 examinations, or 529 images) was performed on CQ500, a dataset of 481 scans performed in India, to evaluate out-of-distribution generalizability. The semi-supervised model was compared with a baseline model trained on only labeled data using area under the receiver operating characteristic curve (AUC), Dice similarity coefficient (DSC), and average precision (AP) metrics. Results: The semi-supervised model achieved statistically significantly higher examination AUC on CQ500 compared with the baseline (0.939 [0.938, 0.940] vs. 0.907 [0.906, 0.908]) (p=0.009). It also achieved a higher DSC (0.829 [0.825, 0.833] vs. 0.809 [0.803, 0.812]) (p=0.012) and Pixel AP (0.848 [0.843, 0.853]) vs. 0.828 [0.817, 0.828]) compared to the baseline. Conclusion: The addition of unlabeled data in a semi-supervised learning framework demonstrates stronger generalizability potential for intracranial hemorrhage detection and segmentation compared with a supervised baseline.