CYApr 4, 2023Code
DiaTrend: A dataset from advanced diabetes technology to enable development of novel analytic solutionsTemiloluwa Prioleau, Abigail Bartolome, Richard Comi et al.
Objective digital data is scarce yet needed in many domains to enable research that can transform the standard of healthcare. While data from consumer-grade wearables and smartphones is more accessible, there is critical need for similar data from clinical-grade devices used by patients with a diagnosed condition. The prevalence of wearable medical devices in the diabetes domain sets the stage for unique research and development within this field and beyond. However, the scarcity of open-source datasets presents a major barrier to progress. To facilitate broader research on diabetes-relevant problems and accelerate development of robust computational solutions, we provide the DiaTrend dataset. The DiaTrend dataset is composed of intensive longitudinal data from wearable medical devices, including a total of 27,561 days of continuous glucose monitor data and 8,220 days of insulin pump data from 54 patients with diabetes. This dataset is useful for developing novel analytic solutions that can reduce the disease burden for people living with diabetes and increase knowledge on chronic condition management in outpatient settings.
53.0AIApr 18
If Only My CGM Could Speak: A Privacy-Preserving Agent for Question Answering over Continuous Glucose DataYanjun Cui, Ali Emami, Temiloluwa Prioleau et al.
Continuous glucose monitors (CGMs) used in diabetes care collect rich personal health data that could improve day-to-day self-management. However, current patient platforms only offer static summaries which do not support inquisitive user queries. Large language models (LLMs) could enable free-form inquiries about continuous glucose data, but deploying them over sensitive health records raises privacy and accuracy concerns. In this paper, we present CGM-Agent, a privacy-preserving framework for question answering over personal glucose data. In our design, the LLM serves purely as a reasoning engine that selects analytical functions. All computation occurs locally, and personal health data never leaves the user's device. For evaluation, we construct a benchmark of 4,180 questions combining parameterized question templates with real user queries and ground truth derived from deterministic program execution. Evaluating 6 leading LLMs, we find that top models achieve 94\% value accuracy on synthetic queries and 88\% on ambiguous real-world queries. Errors stem primarily from intent and temporal ambiguity rather than computational failures. Additionally, lightweight models achieve competitive performance in our agent design, suggesting opportunities for low-cost deployment. We release our code and benchmark to support future work on trustworthy health agents.
AIJul 18, 2025Code
Glucose-ML: A collection of longitudinal diabetes datasets for development of robust AI solutionsTemiloluwa Prioleau, Baiying Lu, Yanjun Cui
Artificial intelligence (AI) algorithms are a critical part of state-of-the-art digital health technology for diabetes management. Yet, access to large high-quality datasets is creating barriers that impede development of robust AI solutions. To accelerate development of transparent, reproducible, and robust AI solutions, we present Glucose-ML, a collection of 10 publicly available diabetes datasets, released within the last 7 years (i.e., 2018 - 2025). The Glucose-ML collection comprises over 300,000 days of continuous glucose monitor (CGM) data with a total of 38 million glucose samples collected from 2500+ people across 4 countries. Participants include persons living with type 1 diabetes, type 2 diabetes, prediabetes, and no diabetes. To support researchers and innovators with using this rich collection of diabetes datasets, we present a comparative analysis to guide algorithm developers with data selection. Additionally, we conduct a case study for the task of blood glucose prediction - one of the most common AI tasks within the field. Through this case study, we provide a benchmark for short-term blood glucose prediction across all 10 publicly available diabetes datasets within the Glucose-ML collection. We show that the same algorithm can have significantly different prediction results when developed/evaluated with different datasets. Findings from this study are then used to inform recommendations for developing robust AI solutions within the diabetes or broader health domain. We provide direct links to each longitudinal diabetes dataset in the Glucose-ML collection and openly provide our code.
LGFeb 11, 2021
Feature Selection for Multivariate Time Series via Network PruningKang Gu, Soroush Vosoughi, Temiloluwa Prioleau
In recent years, there has been an ever increasing amount of multivariate time series (MTS) data in various domains, typically generated by a large family of sensors such as wearable devices. This has led to the development of novel learning methods on MTS data, with deep learning models dominating the most recent advancements. Prior literature has primarily focused on designing new network architectures for modeling temporal dependencies within MTS. However, a less studied challenge is associated with high dimensionality of MTS data. In this paper, we propose a novel neural component, namely Neural Feature Selector (NFS), as an end-2-end solution for feature selection in MTS data. Specifically, NFS is based on decomposed convolution design and includes two modules: firstly each feature stream (a stream corresponds to an univariate series of MTS) within MTS is processed by a temporal CNN independently; then an aggregating CNN combines the processed streams to produce input for other downstream networks. We evaluated the proposed NFS model on four real-world MTS datasets and found that it achieves comparable results with state-of-the-art methods while providing the benefit of feature selection. Our paper also highlights the robustness and effectiveness of feature selection with NFS compared to using recent autoencoder-based methods.
CYJul 24, 2020
Understanding Reflection Needs for Personal Health Data in DiabetesTemiloluwa Prioleau, Ashutosh Sabharwal, Madhuri M. Vasudevan
To empower users of wearable medical devices, it is important to enable methods that facilitate reflection on previous care to improve future outcomes. In this work, we conducted a two-phase user-study involving patients, caregivers, and clinicians to understand gaps in current approaches that support reflection and user needs for new solutions. Our results show that users desire to have specific summarization metrics, solutions that minimize cognitive effort, and solutions that enable data integration to support meaningful reflection on diabetes management. In addition, we developed and evaluated a visualization called PixelGrid that presents key metrics in a matrix-based plot. Majority of users (84%) found the matrix-based approach to be useful for identifying salient patterns related to certain times and days in blood glucose data. Through our evaluation we identified that users desire data visualization solutions with complementary textual descriptors, concise and flexible presentation, contextually-fitting content, and informative and actionable insights. Directions for future research on tools that automate pattern discovery, detect abnormalities, and provide recommendations to improve care were also identified.