Demo: Healthcare Agent Orchestrator (HAO) for Patient Summarization in Molecular Tumor Boards
This addresses the problem of manual patient summarization for oncology specialists in Molecular Tumor Boards, offering a scalable AI solution with incremental improvements in automation and evaluation.
The paper tackles the labor-intensive and subjective process of creating patient summaries for Molecular Tumor Boards by introducing HAO, an LLM-driven AI agent that coordinates a multi-agent clinical workflow to generate accurate summaries, achieving 94% capture of high-importance information and a TBFact recall of 0.84. It also proposes TBFact, a model-as-a-judge framework to evaluate summary comprehensiveness and succinctness without sharing sensitive clinical data.
Molecular Tumor Boards (MTBs) are multidisciplinary forums where oncology specialists collaboratively assess complex patient cases to determine optimal treatment strategies. A central element of this process is the patient summary, typically compiled by a medical oncologist, radiation oncologist, or surgeon, or their trained medical assistant, who distills heterogeneous medical records into a concise narrative to facilitate discussion. This manual approach is often labor-intensive, subjective, and prone to omissions of critical information. To address these limitations, we introduce the Healthcare Agent Orchestrator (HAO), a Large Language Model (LLM)-driven AI agent that coordinates a multi-agent clinical workflow to generate accurate and comprehensive patient summaries for MTBs. Evaluating predicted patient summaries against ground truth presents additional challenges due to stylistic variation, ordering, synonym usage, and phrasing differences, which complicate the measurement of both succinctness and completeness. To overcome these evaluation hurdles, we propose TBFact, a ``model-as-a-judge'' framework designed to assess the comprehensiveness and succinctness of generated summaries. Using a benchmark dataset derived from de-identified tumor board discussions, we applied TBFact to evaluate our Patient History agent. Results show that the agent captured 94% of high-importance information (including partial entailments) and achieved a TBFact recall of 0.84 under strict entailment criteria. We further demonstrate that TBFact enables a data-free evaluation framework that institutions can deploy locally without sharing sensitive clinical data. Together, HAO and TBFact establish a robust foundation for delivering reliable and scalable support to MTBs.