AICLOct 20, 2025

Offline Policy Evaluation of Multi-Turn LLM Health Coaching with Real Users

arXiv:2510.17173v21 citationsh-index: 1
Originality Incremental advance
AI Analysis

This addresses the challenge of personalizing health coaching for real users, though it is incremental as it builds on existing offline policy evaluation methods.

The study tackled the problem of evaluating and improving a multi-turn LLM health coach with real users, finding that a uniform heavy-tool policy increased average value but harmed specific subgroups like low-health-literacy/high-self-efficacy users, and that adding an early information-gain bonus in a simulator shortened trait identification and improved goal success and pass@3.

We study a web-deployed, tool-augmented LLM health coach with real users. In a pilot with seven users (280 rated turns), offline policy evaluation (OPE) over factorized decision heads (Tool/Style) shows that a uniform heavy-tool policy raises average value on logs but harms specific subgroups, most notably low-health-literacy/high-self-efficacy users. A lightweight simulator with hidden archetypes further shows that adding a small early information-gain bonus reliably shortens trait identification and improves goal success and pass@3. Together, these early findings indicate an evaluation-first path to personalization: freeze the generator, learn subgroup-aware decision heads on typed rewards (objective tool outcomes and satisfaction), and always report per-archetype metrics to surface subgroup harms that averages obscure.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes