Development, Evaluation, and Deployment of a Multi-Agent System for Thoracic Tumor Board
This work demonstrates a practical integration of AI into clinical workflow for generating patient case summaries, reducing manual effort for tumor board preparation.
The authors developed and deployed an automated AI chart summarization system for thoracic tumor boards, achieving performance comparable to physician-written summaries as measured by fact-based scoring rubrics, and validated an LLM-as-judge evaluation strategy.
Tumor boards are multidisciplinary conferences dedicated to producing actionable patient care recommendations with live review of primary radiology and pathology data. Succinct patient case summaries are needed to drive efficient and accurate case discussions. We developed a manual AI-based workflow to generate patient summaries to display live at the Stanford Thoracic Tumor board. To improve on this manually intensive process, we developed several automated AI chart summarization methods and evaluated them against physician gold standard summaries and fact-based scoring rubrics. We report these comparative evaluations as well as our deployment of the final state automated AI chart summarization tool along with post-deployment monitoring. We also validate the use of an LLM as a judge evaluation strategy for fact-based scoring. This work is an example of integrating AI-based workflows into routine clinical practice.