AIJan 20

LifeAgentBench: A Multi-dimensional Benchmark and Agent for Personal Health Assistants in Digital Health

arXiv:2601.13880v1h-index: 4
Originality Incremental advance
AI Analysis

This work addresses the problem of assessing LLM capabilities for personal health assistants, providing a benchmark and baseline for researchers, but it is incremental as it builds on existing LLM and sensing technologies.

The authors tackled the lack of systematic benchmarks for evaluating large language models (LLMs) in personalized digital health by introducing LifeAgentBench, a large-scale QA benchmark with 22,573 questions, and proposed LifeAgent, an agent that achieved significant improvements over baselines.

Personalized digital health support requires long-horizon, cross-dimensional reasoning over heterogeneous lifestyle signals, and recent advances in mobile sensing and large language models (LLMs) make such support increasingly feasible. However, the capabilities of current LLMs in this setting remain unclear due to the lack of systematic benchmarks. In this paper, we introduce LifeAgentBench, a large-scale QA benchmark for long-horizon, cross-dimensional, and multi-user lifestyle health reasoning, containing 22,573 questions spanning from basic retrieval to complex reasoning. We release an extensible benchmark construction pipeline and a standardized evaluation protocol to enable reliable and scalable assessment of LLM-based health assistants. We then systematically evaluate 11 leading LLMs on LifeAgentBench and identify key bottlenecks in long-horizon aggregation and cross-dimensional reasoning. Motivated by these findings, we propose LifeAgent as a strong baseline agent for health assistant that integrates multi-step evidence retrieval with deterministic aggregation, achieving significant improvements compared with two widely used baselines. Case studies further demonstrate its potential in realistic daily-life scenarios. The benchmark is publicly available at https://anonymous.4open.science/r/LifeAgentBench-CE7B.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes