CLAINov 10, 2025

Surgical Agent Orchestration Platform for Voice-directed Patient Data Interaction

arXiv:2511.07392v2h-index: 14
AI Analysis

This addresses a critical issue for surgeons in minimally invasive robotic surgery by enabling hands-free data interaction, though it is an incremental application of existing LLM methods to a new domain.

The paper tackles the problem of surgeons being unable to access patient data during da Vinci robotic surgery by proposing a voice-directed Surgical Agent Orchestrator Platform (SAOP) that uses LLM-based agents to map voice commands into tasks like retrieving information or manipulating scans, achieving high accuracy and success rates across 240 voice commands.

In da Vinci robotic surgery, surgeons' hands and eyes are fully engaged in the procedure, making it difficult to access and manipulate multimodal patient data without interruption. We propose a voice-directed Surgical Agent Orchestrator Platform (SAOP) built on a hierarchical multi-agent framework, consisting of an orchestration agent and three task-specific agents driven by Large Language Models (LLMs). These LLM-based agents autonomously plan, refine, validate, and reason to map voice commands into specific tasks such as retrieving clinical information, manipulating CT scans, or navigating 3D anatomical models on the surgical video. We also introduce a Multi-level Orchestration Evaluation Metric (MOEM) to comprehensively assess the performance and robustness from command-level and category-level perspectives. The SAOP achieves high accuracy and success rates across 240 voice commands, while LLM-based agents improve robustness against speech recognition errors and diverse or ambiguous free-form commands, demonstrating strong potential to support minimally invasive da Vinci robotic surgery.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes