Asiful Arefeen

LG
h-index55
10papers
56citations
Novelty47%
AI Score44

10 Papers

LGJan 21Code
Counterfactual Modeling with Fine-Tuned LLMs for Health Intervention Design and Sensor Data Augmentation

Shovito Barua Soumma, Asiful Arefeen, Stephanie M. Carpenter et al.

Counterfactual explanations (CFEs) provide human-centric interpretability by identifying the minimal, actionable changes required to alter a machine learning model's prediction. Therefore, CFs can be used as (i) interventions for abnormality prevention and (ii) augmented data for training robust models. We conduct a comprehensive evaluation of CF generation using large language models (LLMs), including GPT-4 (zero-shot and few-shot) and two open-source models-BioMistral-7B and LLaMA-3.1-8B, in both pretrained and fine-tuned configurations. Using the multimodal AI-READI clinical dataset, we assess CFs across three dimensions: intervention quality, feature diversity, and augmentation effectiveness. Fine-tuned LLMs, particularly LLaMA-3.1-8B, produce CFs with high plausibility (up to 99%), strong validity (up to 0.99), and realistic, behaviorally modifiable feature adjustments. When used for data augmentation under controlled label-scarcity settings, LLM-generated CFs substantially restore classifier performance, yielding an average 20% F1 recovery across three scarcity scenarios. Compared with optimization-based baselines such as DiCE, CFNOW, and NICE, LLMs offer a flexible, model-agnostic approach that generates more clinically actionable and semantically coherent counterfactuals. Overall, this work demonstrates the promise of LLM-driven counterfactuals for both interpretable intervention design and data-efficient model training in sensor-based digital health. Impact: SenseCF fine-tunes an LLM to generate valid, representative counterfactual explanations and supplement minority class in an imbalanced dataset for improving model training and boosting model robustness and predictive performance

AIOct 2, 2023
Designing User-Centric Behavioral Interventions to Prevent Dysglycemia with Novel Counterfactual Explanations

Asiful Arefeen, Hassan Ghasemzadeh

Monitoring unexpected health events and taking actionable measures to avert them beforehand is central to maintaining health and preventing disease. Therefore, a tool capable of predicting adverse health events and offering users actionable feedback about how to make changes in their diet, exercise, and medication to prevent abnormal health events could have significant societal impacts. Counterfactual explanations can provide insights into why a model made a particular prediction by generating hypothetical instances that are similar to the original input but lead to a different prediction outcome. Therefore, counterfactuals can be viewed as a means to design AI-driven health interventions to not only predict but also prevent adverse health outcomes such as blood glucose spikes, diabetes, and heart disease. In this paper, we design \textit{\textbf{ExAct}}, a novel model-agnostic framework for generating counterfactual explanations for chronic disease prevention and management. Leveraging insights from adversarial learning, ExAct characterizes the decision boundary for high-dimensional data and performs a grid search to generate actionable interventions. ExAct is unique in integrating prior knowledge about user preferences of feasible explanations into the process of counterfactual generation. ExAct is evaluated extensively using four real-world datasets and external simulators. With $82.8\%$ average validity in the simulation-aided validation, ExAct surpasses the state-of-the-art techniques for generating counterfactual explanations by at least $10\%$. Besides, counterfactuals from ExAct exhibit at least $6.6\%$ improved proximity compared to previous research.

LGMar 5, 2025Code
LLM-Powered Prediction of Hyperglycemia and Discovery of Behavioral Treatment Pathways from Wearables and Diet

Abdullah Mamun, Asiful Arefeen, Susan B. Racette et al.

Postprandial hyperglycemia, marked by the blood glucose level exceeding the normal range after consuming a meal, is a critical indicator of progression toward type 2 diabetes in people with prediabetes and in healthy individuals. A key metric for understanding blood glucose dynamics after eating is the postprandial area under the curve (AUC). Predicting postprandial AUC in advance based on a person's lifestyle factors, such as diet and physical activity level, and explaining the factors that affect postprandial blood glucose could allow an individual to adjust their lifestyle accordingly to maintain normal glucose levels. In this study, we developed an explainable machine learning solution, GlucoLens, that takes sensor-driven inputs and uses advanced data processing, large language models, and trainable machine learning models to predict postprandial AUC and hyperglycemia from diet, physical activity, and recent glucose patterns. We used data obtained from wearables in a five-week clinical trial of 10 adults who worked full-time to develop and evaluate the proposed computational model that integrates wearable sensing, multimodal data, and machine learning. Our machine learning model takes multimodal data from wearable activity and glucose monitoring sensors, along with food and work logs, and provides an interpretable prediction of the postprandial glucose pattern. Our GlucoLens system achieves a normalized root mean squared error (NRMSE) of 0.123 in its best configuration. On average, the proposed technology provides a 16% better performance level compared to the comparison models. Additionally, our technique predicts hyperglycemia with an accuracy of 73.3% and an F1 score of 0.716 and recommends different treatment options to help avoid hyperglycemia through diverse counterfactual explanations. Code available: https://github.com/ab9mamun/GlucoLens.

AIJul 7, 2025Code
SenseCF: LLM-Prompted Counterfactuals for Intervention and Sensor Data Augmentation

Shovito Barua Soumma, Asiful Arefeen, Stephanie M. Carpenter et al.

Counterfactual explanations (CFs) offer human-centric insights into machine learning predictions by highlighting minimal changes required to alter an outcome. Therefore, CFs can be used as (i) interventions for abnormality prevention and (ii) augmented data for training robust models. In this work, we explore large language models (LLMs), specifically GPT-4o-mini, for generating CFs in a zero-shot and three-shot setting. We evaluate our approach on two datasets: the AI-Readi flagship dataset for stress prediction and a public dataset for heart disease detection. Compared to traditional methods such as DiCE, CFNOW, and NICE, our few-shot LLM-based approach achieves high plausibility (up to 99%), strong validity (up to 0.99), and competitive sparsity. Moreover, using LLM-generated CFs as augmented samples improves downstream classifier performance (an average accuracy gain of 5%), especially in low-data regimes. This demonstrates the potential of prompt-based generative techniques to enhance explainability and robustness in clinical and physiological prediction tasks. Code base: github.com/shovito66/SenseCF.

AIFeb 28, 2025
NutriGen: Personalized Meal Plan Generator Leveraging Large Language Models to Enhance Dietary and Nutritional Adherence

Saman Khamesian, Asiful Arefeen, Stephanie M. Carpenter et al.

Maintaining a balanced diet is essential for overall health, yet many individuals struggle with meal planning due to nutritional complexity, time constraints, and lack of dietary knowledge. Personalized food recommendations can help address these challenges by tailoring meal plans to individual preferences, habits, and dietary restrictions. However, existing dietary recommendation systems often lack adaptability, fail to consider real-world constraints such as food ingredient availability, and require extensive user input, making them impractical for sustainable and scalable daily use. To address these limitations, we introduce NutriGen, a framework based on large language models (LLM) designed to generate personalized meal plans that align with user-defined dietary preferences and constraints. By building a personalized nutrition database and leveraging prompt engineering, our approach enables LLMs to incorporate reliable nutritional references like the USDA nutrition database while maintaining flexibility and ease-of-use. We demonstrate that LLMs have strong potential in generating accurate and user-friendly food recommendations, addressing key limitations in existing dietary recommendation systems by providing structured, practical, and scalable meal plans. Our evaluation shows that Llama 3.1 8B and GPT-3.5 Turbo achieve the lowest percentage errors of 1.55\% and 3.68\%, respectively, producing meal plans that closely align with user-defined caloric targets while minimizing deviation and improving precision. Additionally, we compared the performance of DeepSeek V3 against several established models to evaluate its potential in personalized nutrition planning.

LGFeb 20, 2025
Type 1 Diabetes Management using GLIMMER: Glucose Level Indicator Model with Modified Error Rate

Saman Khamesian, Asiful Arefeen, Maria Adela Grando et al.

Managing Type 1 Diabetes (T1D) demands constant vigilance as individuals strive to regulate their blood glucose levels to avoid the harmful effects of dysglycemia, including both hyperglycemia and hypoglycemia. Despite the development of advanced technologies such as automated insulin delivery (AID) systems, achieving optimal glycemic control remains challenging. AID systems combine continuous subcutaneous insulin infusion with data from continuous glucose monitors (CGMs), offering potential benefits in reducing glucose variability and increasing time-in-range. However, these systems still frequently fail to prevent dysglycemia, partly due to limitations in their prediction algorithms, which lack the accuracy needed to avert abnormal glucose events. This shortcoming highlights the need for more advanced glucose forecasting methods. To address this need, we introduce GLIMMER, Glucose Level Indicator Model with Modified Error Rate, a machine learning-based model for predicting blood glucose levels. GLIMMER classifies glucose values into normal and abnormal ranges and employs a novel custom loss function that prioritizes accuracy in dysglycemic regions, where patient safety is most critical. To evaluate GLIMMER's effectiveness for T1D management, we used both a publicly available dataset and a newly collected dataset involving 25 individuals with T1D. In forecasting glucose levels for the next hour, GLIMMER achieved a root mean square error (RMSE) of 23.97 (+/-3.77) and a mean absolute error (MAE) of 15.83 (+/-2.09) mg/dL. These results represent a 23% improvement in RMSE and a 31% improvement in MAE compared to the best previously reported models.

LGMay 27, 2025
AZT1D: A Real-World Dataset for Type 1 Diabetes

Saman Khamesian, Asiful Arefeen, Bithika M. Thompson et al.

High quality real world datasets are essential for advancing data driven approaches in type 1 diabetes (T1D) management, including personalized therapy design, digital twin systems, and glucose prediction models. However, progress in this area has been limited by the scarcity of publicly available datasets that offer detailed and comprehensive patient data. To address this gap, we present AZT1D, a dataset containing data collected from 25 individuals with T1D on automated insulin delivery (AID) systems. AZT1D includes continuous glucose monitoring (CGM) data, insulin pump and insulin administration data, carbohydrate intake, and device mode (regular, sleep, and exercise) obtained over 6 to 8 weeks for each patient. Notably, the dataset provides granular details on bolus insulin delivery (i.e., total dose, bolus type, correction specific amounts) features that are rarely found in existing datasets. By offering rich, naturalistic data, AZT1D supports a wide range of artificial intelligence and machine learning applications aimed at improving clinical decision making and individualized care in T1D.

LGApr 14, 2025
GlyTwin: Digital Twin for Glucose Control in Type 1 Diabetes Through Optimal Behavioral Modifications Using Patient-Centric Counterfactuals

Asiful Arefeen, Saman Khamesian, Maria Adela Grando et al.

Frequent and long-term exposure to hyperglycemia (i.e., high blood glucose) increases the risk of chronic complications such as neuropathy, nephropathy, and cardiovascular disease. Current technologies like continuous subcutaneous insulin infusion (CSII) and continuous glucose monitoring (CGM) primarily model specific aspects of glycemic control-like hypoglycemia prediction or insulin delivery. Similarly, most digital twin approaches in diabetes management simulate only physiological processes. These systems lack the ability to offer alternative treatment scenarios that support proactive behavioral interventions. To address this, we propose GlyTwin, a novel digital twin framework that uses counterfactual explanations to simulate optimal treatments for glucose regulation. Our approach helps patients and caregivers modify behaviors like carbohydrate intake and insulin dosing to avoid abnormal glucose events. GlyTwin generates behavioral treatment suggestions that proactively prevent hyperglycemia by recommending small adjustments to daily choices, reducing both frequency and duration of these events. Additionally, it incorporates stakeholder preferences into the intervention design, making recommendations patient-centric and tailored. We evaluate GlyTwin on AZT1D, a newly constructed dataset with longitudinal data from 21 type 1 diabetes (T1D) patients on automated insulin delivery systems over 26 days. Results show GlyTwin outperforms state-of-the-art counterfactual methods, generating 76.6% valid and 86% effective interventions. These findings demonstrate the promise of counterfactual-driven digital twins in delivering personalized healthcare.

LGAug 14, 2025
RealAC: A Domain-Agnostic Framework for Realistic and Actionable Counterfactual Explanations

Asiful Arefeen, Shovito Barua Soumma, Hassan Ghasemzadeh

Counterfactual explanations provide human-understandable reasoning for AI-made decisions by describing minimal changes to input features that would alter a model's prediction. To be truly useful in practice, such explanations must be realistic and feasible -- they should respect both the underlying data distribution and user-defined feasibility constraints. Existing approaches often enforce inter-feature dependencies through rigid, hand-crafted constraints or domain-specific knowledge, which limits their generalizability and ability to capture complex, nonlinear relations inherent in data. Moreover, they rarely accommodate user-specified preferences and suggest explanations that are causally implausible or infeasible to act upon. We introduce RealAC, a domain-agnostic framework for generating realistic and actionable counterfactuals. RealAC automatically preserves complex inter-feature dependencies without relying on explicit domain knowledge -- by aligning the joint distributions of feature pairs between factual and counterfactual instances. The framework also allows end-users to ``freeze'' attributes they cannot or do not wish to change by suppressing change in frozen features during optimization. Evaluations on three synthetic and two real datasets demonstrate that RealAC balances realism with actionability. Our method outperforms state-of-the-art baselines and Large Language Model-based counterfactual generation techniques in causal edge score, dependency preservation score, and IM1 realism metric and offers a solution for causality-aware and user-centric counterfactual generation.

SPJul 1, 2021
Inter-Beat Interval Estimation with Tiramisu Model: A Novel Approach with Reduced Error

Asiful Arefeen, Ali Akbari, Seyed Iman Mirzadeh et al.

Inter-beat interval (IBI) measurement enables estimation of heart-rate variability (HRV) which, in turns, can provide early indication of potential cardiovascular diseases. However, extracting IBIs from noisy signals is challenging since the morphology of the signal is distorted in the presence of the noise. Electrocardiogram (ECG) of a person in heavy motion is highly corrupted with noise, known as motion-artifact, and IBI extracted from it is inaccurate. As a part of remote health monitoring and wearable system development, denoising ECG signals and estimating IBIs correctly from them have become an emerging topic among signal-processing researchers. Apart from conventional methods, deep-learning techniques have been successfully used in signal denoising recently, and diagnosis process has become easier, leading to accuracy levels that were previously unachievable. We propose a deep-learning approach leveraging tiramisu autoencoder model to suppress motion-artifact noise and make the R-peaks of the ECG signal prominent even in the presence of high-intensity motion. After denoising, IBIs are estimated more accurately expediting diagnosis tasks. Results illustrate that our method enables IBI estimation from noisy ECG signals with SNR up to -30dB with average root mean square error (RMSE) of 13 milliseconds for estimated IBIs. At this noise level, our error percentage remains below 8% and outperforms other state of the art techniques.