Meredith Franklin

LG
h-index11
6papers
37citations
Novelty44%
AI Score40

6 Papers

LGAug 14, 2023
Physics-Informed Deep Learning to Reduce the Bias in Joint Prediction of Nitrogen Oxides

Lianfa Li, Roxana Khalili, Frederick Lurmann et al.

Atmospheric nitrogen oxides (NOx) primarily from fuel combustion have recognized acute and chronic health and environmental effects. Machine learning (ML) methods have significantly enhanced our capacity to predict NOx concentrations at ground-level with high spatiotemporal resolution but may suffer from high estimation bias since they lack physical and chemical knowledge about air pollution dynamics. Chemical transport models (CTMs) leverage this knowledge; however, accurate predictions of ground-level concentrations typically necessitate extensive post-calibration. Here, we present a physics-informed deep learning framework that encodes advection-diffusion mechanisms and fluid dynamics constraints to jointly predict NO2 and NOx and reduce ML model bias by 21-42%. Our approach captures fine-scale transport of NO2 and NOx, generates robust spatial extrapolation, and provides explicit uncertainty estimation. The framework fuses knowledge-driven physicochemical principles of CTMs with the predictive power of ML for air quality exposure, health, and policy applications. Our approach offers significant improvements over purely data-driven ML methods and has unprecedented bias reduction in joint NO2 and NOx prediction.

MLApr 23, 2023
Quantile Extreme Gradient Boosting for Uncertainty Quantification

Xiaozhe Yin, Masoud Fallah-Shorshani, Rob McConnell et al.

As the availability, size and complexity of data have increased in recent years, machine learning (ML) techniques have become popular for modeling. Predictions resulting from applying ML models are often used for inference, decision-making, and downstream applications. A crucial yet often overlooked aspect of ML is uncertainty quantification, which can significantly impact how predictions from models are used and interpreted. Extreme Gradient Boosting (XGBoost) is one of the most popular ML methods given its simple implementation, fast computation, and sequential learning, which make its predictions highly accurate compared to other methods. However, techniques for uncertainty determination in ML models such as XGBoost have not yet been universally agreed among its varying applications. We propose enhancements to XGBoost whereby a modified quantile regression is used as the objective function to estimate uncertainty (QXGBoost). Specifically, we included the Huber norm in the quantile regression model to construct a differentiable approximation to the quantile regression error function. This key step allows XGBoost, which uses a gradient-based optimization algorithm, to make probabilistic predictions efficiently. QXGBoost was applied to create 90\% prediction intervals for one simulated dataset and one real-world environmental dataset of measured traffic noise. Our proposed method had comparable or better performance than the uncertainty estimates generated for regular and quantile light gradient boosting. For both the simulated and traffic noise datasets, the overall performance of the prediction intervals from QXGBoost were better than other models based on coverage width-based criterion.

HCMar 6
Structured Exploration vs. Generative Flexibility: A Field Study Comparing Bandit and LLM Architectures for Personalised Health Behaviour Interventions

Dominik P. Hofer, Haochen Song, Rania Islambouli et al.

Behaviour Change Techniques (BCTs) are central to digital health interventions, yet selecting and delivering effective techniques remains challenging. Contextual bandits enable statistically grounded optimisation of BCT selection, while Large Language Models (LLMs) offer flexible, context-sensitive message generation. We conducted a 4-week study on physical activity motivation (N=54; 9 post-study interviews) that compared five daily messaging approaches: random templates, contextual bandit with templates, LLM generation, hybrid bandit+LLM, and LLM with interaction history. LLM-based approaches were rated substantially more helpful than templates, but no significant differences emerged among LLM conditions. Unexpectedly, bandit optimisation for BCTs selection yielded no additional perceived helpfulness compared with LLM-only approaches. Unconstrained LLMs focused heavily on a single BCT, whereas bandit systems enforced systematic exploration-exploitation across techniques. Quantitative and qualitative findings suggest contextual acknowledgement of user input drove perceived helpfulness. We contribute design suggestions for reflective AI health behaviour change systems that address a trade-off between structured exploration and generative autonomy.

LGJun 8, 2025
Investigating the Relationship Between Physical Activity and Tailored Behavior Change Messaging: Connecting Contextual Bandit with Large Language Models

Haochen Song, Dominik Hofer, Rania Islambouli et al.

Machine learning approaches, such as contextual multi-armed bandit (cMAB) algorithms, offer a promising strategy to reduce sedentary behavior by delivering personalized interventions to encourage physical activity. However, cMAB algorithms typically require large participant samples to learn effectively and may overlook key psychological factors that are not explicitly encoded in the model. In this study, we propose a hybrid approach that combines cMAB for selecting intervention types with large language models (LLMs) to personalize message content. We evaluate four intervention types: behavioral self-monitoring, gain-framed, loss-framed, and social comparison, each delivered as a motivational message aimed at increasing motivation for physical activity and daily step count. Message content is further personalized using dynamic contextual factors including daily fluctuations in self-efficacy, social influence, and regulatory focus. Over a seven-day trial, participants receive daily messages assigned by one of four models: cMAB alone, LLM alone, combined cMAB with LLM personalization (cMABxLLM), or equal randomization (RCT). Outcomes include daily step count and message acceptance, assessed via ecological momentary assessments (EMAs). We apply a causal inference framework to evaluate the effects of each model. Our findings offer new insights into the complementary roles of LLM-based personalization and cMAB adaptation in promoting physical activity through personalized behavioral messaging.

LGJan 7, 2025
Adaptive Experiments Under Data Sparse Settings: Applications for Educational Platforms

Haochen Song, Ilya Musabirov, Ananya Bhattacharjee et al.

Adaptive experimentation is increasingly used in educational platforms to personalize learning through dynamic content and feedback. However, standard adaptive strategies such as Thompson Sampling often underperform in real-world educational settings where content variations are numerous and student participation is limited, resulting in sparse data. In particular, Thompson Sampling can lead to imbalanced content allocation and delayed convergence on which aspects of content are most effective for student learning. To address these challenges, we introduce Weighted Allocation Probability Adjusted Thompson Sampling (WAPTS), an algorithm that refines the sampling strategy to improve content-related decision-making in data-sparse environments. WAPTS is guided by the principle of lenient regret, allowing near-optimal allocations to accelerate learning while still exploring promising content. We evaluate WAPTS in a learnersourcing scenario where students rate peer-generated learning materials, and demonstrate that it enables earlier and more reliable identification of promising treatments.

LGDec 10, 2021
Building Autocorrelation-Aware Representations for Fine-Scale Spatiotemporal Prediction

Yijun Lin, Yao-Yi Chiang, Meredith Franklin et al.

Many scientific prediction problems have spatiotemporal data- and modeling-related challenges in handling complex variations in space and time using only sparse and unevenly distributed observations. This paper presents a novel deep learning architecture, Deep learning predictions for LocATion-dependent Time-sEries data (DeepLATTE), that explicitly incorporates theories of spatial statistics into neural networks to address these challenges. In addition to a feature selection module and a spatiotemporal learning module, DeepLATTE contains an autocorrelation-guided semi-supervised learning strategy to enforce both local autocorrelation patterns and global autocorrelation trends of the predictions in the learned spatiotemporal embedding space to be consistent with the observed data, overcoming the limitation of sparse and unevenly distributed observations. During the training process, both supervised and semi-supervised losses guide the updates of the entire network to: 1) prevent overfitting, 2) refine feature selection, 3) learn useful spatiotemporal representations, and 4) improve overall prediction. We conduct a demonstration of DeepLATTE using publicly available data for an important public health topic, air quality prediction, in a well-studied, complex physical environment - Los Angeles. The experiment demonstrates that the proposed approach provides accurate fine-spatial-scale air quality predictions and reveals the critical environmental factors affecting the results.