CLJun 1
Towards Multidisciplinary Summarization of Hospital Stays: Efficient Sentence-Level Clinical Provenance CategorizationBaris Karacan, Vaibhav Bhargava, Barbara Di Eugenio et al.
Effective "all-team" summarization in high-complexity settings like the Neonatal Intensive Care Unit (NICU) requires aggregating insights from diverse disciplines (physicians, nurses, therapists) spread across hundreds of clinical free-text notes. Simply pooling heterogeneous text often leads to incoherent outputs. Structured summarization therefore first requires accurate categorization of sentence-level provenance across multi-source notes. This pilot study introduces a clinical provenance categorization pipeline using supervised fine-tuning (SFT) of large language models (LLMs). We adapted two Llama-3 models (8B and 70B) to MedSecId, a corpus of 2,002 MIMIC-III (Adult ICU) notes annotated with clinical provenance headers, achieving in-domain Macro F1 scores above 92% for both models. To evaluate cross-domain generalization, we assessed model capacity (8B vs. 70B) and quantization on a gold-standard dataset of 227 sentence-level spans derived from three multi-disciplinary NICU summaries. Experimental results demonstrate a scale-dependent transfer effect: while SFT produced only marginal changes for the 8B model, it substantially improved the 70B model, increasing Macro F1 by 7%. Notably, the quantized fine-tuned 70B model outperformed its full-precision baseline while substantially reducing computational requirements. These findings suggest that sufficient model capacity is critical for preserving semantic flexibility during cross-domain clinical transfer and that efficient quantized adaptation can enable structured provenance modeling for downstream summarization.
CLApr 1, 2024
A Neuro-Symbolic Approach to Monitoring Salt Content in FoodAnuja Tayal, Barbara Di Eugenio, Devika Salunke et al.
We propose a dialogue system that enables heart failure patients to inquire about salt content in foods and help them monitor and reduce salt intake. Addressing the lack of specific datasets for food-based salt content inquiries, we develop a template-based conversational dataset. The dataset is structured to ask clarification questions to identify food items and their salt content. Our findings indicate that while fine-tuning transformer-based models on the dataset yields limited performance, the integration of Neuro-Symbolic Rules significantly enhances the system's performance. Our experiments show that by integrating neuro-symbolic rules, our system achieves an improvement in joint goal accuracy of over 20% across different data sizes compared to naively fine-tuning transformer-based models.
CLMay 6, 2025
Towards conversational assistants for health applications: using ChatGPT to generate conversations about heart failureAnuja Tayal, Devika Salunke, Barbara Di Eugenio et al.
We explore the potential of ChatGPT (3.5-turbo and 4) to generate conversations focused on self-care strategies for African-American heart failure patients -- a domain with limited specialized datasets. To simulate patient-health educator dialogues, we employed four prompting strategies: domain, African American Vernacular English (AAVE), Social Determinants of Health (SDOH), and SDOH-informed reasoning. Conversations were generated across key self-care domains of food, exercise, and fluid intake, with varying turn lengths (5, 10, 15) and incorporated patient-specific SDOH attributes such as age, gender, neighborhood, and socioeconomic status. Our findings show that effective prompt design is essential. While incorporating SDOH and reasoning improves dialogue quality, ChatGPT still lacks the empathy and engagement needed for meaningful healthcare communication.