Fahmida Liza Piya

CL
h-index16
4papers
14citations
Novelty41%
AI Score42

4 Papers

LGJan 8Code
A Multimodal Data Processing Pipeline for MIMIC-IV Dataset

Farzana Islam Adiba, Varsha Danduri, Fahmida Liza Piya et al.

The MIMIC-IV dataset is a large, publicly available electronic health record (EHR) resource widely used for clinical machine learning research. It comprises multiple modalities, including structured data, clinical notes, waveforms, and imaging data. Working with these disjointed modalities requires an extensive manual effort to preprocess and align them for downstream analysis. While several pipelines for MIMIC-IV data extraction are available, they target a small subset of modalities or do not fully support arbitrary downstream applications. In this work, we greatly expand our prior popular unimodal pipeline and present a comprehensive and customizable multimodal pipeline that can significantly reduce multimodal processing time and enhance the reproducibility of MIMIC-based studies. Our pipeline systematically integrates the listed modalities, enabling automated cohort selection, temporal alignment across modalities, and standardized multimodal output formats suitable for arbitrary static and time-series downstream applications. We release the code, a simple UI, and a Python package for selective integration (with embedding) at https://github.com/healthylaife/MIMIC-IV-Data-Pipeline.

CLFeb 23
AgenticSum: An Agentic Inference-Time Framework for Faithful Clinical Text Summarization

Fahmida Liza Piya, Rahmatollah Beheshti

Large language models (LLMs) offer substantial promise for automating clinical text summarization, yet maintaining factual consistency remains challenging due to the length, noise, and heterogeneity of clinical documentation. We present AgenticSum, an inference-time, agentic framework that separates context selection, generation, verification, and targeted correction to reduce hallucinated content. The framework decomposes summarization into coordinated stages that compress task-relevant context, generate an initial draft, identify weakly supported spans using internal attention grounding signals, and selectively revise flagged content under supervisory control. We evaluate AgenticSum on two public datasets, using reference-based metrics, LLM-as-a-judge assessment, and human evaluation. Across various measures, AgenticSum demonstrates consistent improvements compared to vanilla LLMs and other strong baselines. Our results indicate that structured, agentic design with targeted correction offers an effective inference time solution to improve clinical note summarization using LLMs.

LGMar 26, 2024Code
HealthGAT: Node Classifications in Electronic Health Records using Graph Attention Networks

Fahmida Liza Piya, Mehak Gupta, Rahmatollah Beheshti

While electronic health records (EHRs) are widely used across various applications in healthcare, most applications use the EHRs in their raw (tabular) format. Relying on raw or simple data pre-processing can greatly limit the performance or even applicability of downstream tasks using EHRs. To address this challenge, we present HealthGAT, a novel graph attention network framework that utilizes a hierarchical approach to generate embeddings from EHR, surpassing traditional graph-based methods. Our model iteratively refines the embeddings for medical codes, resulting in improved EHR data analysis. We also introduce customized EHR-centric auxiliary pre-training tasks to leverage the rich medical knowledge embedded within the data. This approach provides a comprehensive analysis of complex medical relationships and offers significant advancement over standard data representation techniques. HealthGAT has demonstrated its effectiveness in various healthcare scenarios through comprehensive evaluations against established methodologies. Specifically, our model shows outstanding performance in node classification and downstream tasks such as predicting readmissions and diagnosis classifications. Our code is available at https://github.com/healthylaife/HealthGAT

CLApr 23, 2025
ConTextual: Improving Clinical Text Summarization in LLMs with Context-preserving Token Filtering and Knowledge Graphs

Fahmida Liza Piya, Rahmatollah Beheshti

Unstructured clinical data can serve as a unique and rich source of information that can meaningfully inform clinical practice. Extracting the most pertinent context from such data is critical for exploiting its true potential toward optimal and timely decision-making in patient care. While prior research has explored various methods for clinical text summarization, most prior studies either process all input tokens uniformly or rely on heuristic-based filters, which can overlook nuanced clinical cues and fail to prioritize information critical for decision-making. In this study, we propose Contextual, a novel framework that integrates a Context-Preserving Token Filtering method with a Domain-Specific Knowledge Graph (KG) for contextual augmentation. By preserving context-specific important tokens and enriching them with structured knowledge, ConTextual improves both linguistic coherence and clinical fidelity. Our extensive empirical evaluations on two public benchmark datasets demonstrate that ConTextual consistently outperforms other baselines. Our proposed approach highlights the complementary role of token-level filtering and structured retrieval in enhancing both linguistic and clinical integrity, as well as offering a scalable solution for improving precision in clinical text generation.