Multi-Task Learning for Extracting Menstrual Characteristics from Clinical Notes
This work addresses the gap in structured data for menstrual health in women's healthcare, though it is incremental as it builds on existing methods like GatorTron and retrieval techniques.
The authors tackled the problem of extracting detailed menstrual characteristics from unstructured clinical notes by proposing a novel NLP pipeline using GatorTron with multi-task prompt-based learning and hybrid retrieval preprocessing. The result was an average F1-score of 90% across all characteristics, trained on fewer than 100 annotated notes.
Menstrual health is a critical yet often overlooked aspect of women's healthcare. Despite its clinical relevance, detailed data on menstrual characteristics is rarely available in structured medical records. To address this gap, we propose a novel Natural Language Processing pipeline to extract key menstrual cycle attributes -- dysmenorrhea, regularity, flow volume, and intermenstrual bleeding. Our approach utilizes the GatorTron model with Multi-Task Prompt-based Learning, enhanced by a hybrid retrieval preprocessing step to identify relevant text segments. It out- performs baseline methods, achieving an average F1-score of 90% across all menstrual characteristics, despite being trained on fewer than 100 annotated clinical notes. The retrieval step consistently improves performance across all approaches, allowing the model to focus on the most relevant segments of lengthy clinical notes. These results show that combining multi-task learning with retrieval improves generalization and performance across menstrual charac- teristics, advancing automated extraction from clinical notes and supporting women's health research.