Ozanan Meireles

CV
h-index32
6papers
346citations
Novelty45%
AI Score44

6 Papers

CLJan 7, 2025Code
Practical Design and Benchmarking of Generative AI Applications for Surgical Billing and Coding

John C. Rollman, Bruce Rogers, Hamed Zaribafzadeh et al.

Background: Healthcare has many manual processes that can benefit from automation and augmentation with Generative Artificial Intelligence (AI), the medical billing and coding process. However, current foundational Large Language Models (LLMs) perform poorly when tasked with generating accurate International Classification of Diseases, 10th edition, Clinical Modification (ICD-10-CM) and Current Procedural Terminology (CPT) codes. Additionally, there are many security and financial challenges in the application of generative AI to healthcare. We present a strategy for developing generative AI tools in healthcare, specifically for medical billing and coding, that balances accuracy, accessibility, and patient privacy. Methods: We fine tune the PHI-3 Mini and PHI-3 Medium LLMs using institutional data and compare the results against the PHI-3 base model, a PHI-3 RAG application, and GPT-4o. We use the post operative surgical report as input and the patients billing claim the associated ICD-10, CPT, and Modifier codes as the target result. Performance is measured by accuracy of code generation, proportion of invalid codes, and the fidelity of the billing claim format. Results: Both fine-tuned models performed better or as well as GPT-4o. The Phi-3 Medium fine-tuned model showed the best performance (ICD-10 Recall and Precision: 72%, 72%; CPT Recall and Precision: 77%, 79%; Modifier Recall and Precision: 63%, 64%). The Phi-3 Medium fine-tuned model only fabricated 1% of ICD-10 codes and 0.6% of CPT codes generated. Conclusions: Our study shows that a small model that is fine-tuned on domain-specific data for specific tasks using a simple set of open-source tools and minimal technological and monetary requirements performs as well as the larger contemporary consumer models.

CVFeb 3, 2024
Hypergraph-Transformer (HGT) for Interactive Event Prediction in Laparoscopic and Robotic Surgery

Lianhao Yin, Yutong Ban, Jennifer Eckhoff et al.

Understanding and anticipating intraoperative events and actions is critical for intraoperative assistance and decision-making during minimally invasive surgery. Automated prediction of events, actions, and the following consequences is addressed through various computational approaches with the objective of augmenting surgeons' perception and decision-making capabilities. We propose a predictive neural network that is capable of understanding and predicting critical interactive aspects of surgical workflow from intra-abdominal video, while flexibly leveraging surgical knowledge graphs. The approach incorporates a hypergraph-transformer (HGT) structure that encodes expert knowledge into the network design and predicts the hidden embedding of the graph. We verify our approach on established surgical datasets and applications, including the detection and prediction of action triplets, and the achievement of the Critical View of Safety (CVS). Moreover, we address specific, safety-related tasks, such as predicting the clipping of cystic duct or artery without prior achievement of the CVS. Our results demonstrate the superiority of our approach compared to unstructured alternatives.

CVMar 12
Surg-R1: A Hierarchical Reasoning Foundation Model for Scalable and Interpretable Surgical Decision Support with Multi-Center Clinical Validation

Jian Jiang, Chenxi Lin, Yiming Gu et al.

Surgical scene understanding demands not only accurate predictions but also interpretable reasoning that surgeons can verify against clinical expertise. However, existing surgical vision-language models generate predictions without reasoning chains, and general-purpose reasoning models fail on compositional surgical tasks without domain-specific knowledge. We present Surg-R1, a surgical Vision-Language Model that addresses this gap through hierarchical reasoning trained via a four-stage pipeline. Our approach introduces three key contributions: (1) a three-level reasoning hierarchy decomposing surgical interpretation into perceptual grounding, relational understanding, and contextual reasoning; (2) the largest surgical chain-of-thought dataset with 320,000 reasoning pairs; and (3) a four-stage training pipeline progressing from supervised fine-tuning to group relative policy optimization and iterative self-improvement. Evaluation on SurgBench, comprising six public benchmarks and six multi-center external validation datasets from five institutions, demonstrates that Surg-R1 achieves the highest Arena Score (64.9%) on public benchmarks versus Gemini 3.0 Pro (46.1%) and GPT-5.1 (37.9%), outperforming both proprietary reasoning models and specialized surgical VLMs on the majority of tasks spanning instrument localization, triplet recognition, phase recognition, action recognition, and critical view of safety assessment, with a 15.2 percentage point improvement over the strongest surgical baseline on external validation.

CVSep 21, 2025
The SAGES Critical View of Safety Challenge: A Global Benchmark for AI-Assisted Surgical Quality Assessment

Deepak Alapatt, Jennifer Eckhoff, Zhiliang Lyu et al.

Advances in artificial intelligence (AI) for surgical quality assessment promise to democratize access to expertise, with applications in training, guidance, and accreditation. This study presents the SAGES Critical View of Safety (CVS) Challenge, the first AI competition organized by a surgical society, using the CVS in laparoscopic cholecystectomy, a universally recommended yet inconsistently performed safety step, as an exemplar of surgical quality assessment. A global collaboration across 54 institutions in 24 countries engaged hundreds of clinicians and engineers to curate 1,000 videos annotated by 20 surgical experts according to a consensus-validated protocol. The challenge addressed key barriers to real-world deployment in surgery, including achieving high performance, capturing uncertainty in subjective assessment, and ensuring robustness to clinical variability. To enable this scale of effort, we developed EndoGlacier, a framework for managing large, heterogeneous surgical video and multi-annotator workflows. Thirteen international teams participated, achieving up to a 17\% relative gain in assessment performance, over 80\% reduction in calibration error, and a 17\% relative improvement in robustness over the state-of-the-art. Analysis of results highlighted methodological trends linked to model performance, providing guidance for future research toward robust, clinically deployable AI for surgical quality assessment.

CYOct 30, 2020
Surgical Data Science -- from Concepts toward Clinical Translation

Lena Maier-Hein, Matthias Eisenmann, Duygu Sarikaya et al.

Recent developments in data science in general and machine learning in particular have transformed the way experts envision the future of surgery. Surgical Data Science (SDS) is a new research field that aims to improve the quality of interventional healthcare through the capture, organization, analysis and modeling of data. While an increasing number of data-driven approaches and clinical applications have been studied in the fields of radiological and clinical data science, translational success stories are still lacking in surgery. In this publication, we shed light on the underlying reasons and provide a roadmap for future advances in the field. Based on an international workshop involving leading researchers in the field of SDS, we review current practice, key achievements and initiatives as well as available standards and tools for a number of topics relevant to the field, namely (1) infrastructure for data acquisition, storage and access in the presence of regulatory constraints, (2) data annotation and sharing and (3) data analytics. We further complement this technical perspective with (4) a review of currently available SDS products and the translational progress from academia and (5) a roadmap for faster clinical translation and exploitation of the full potential of SDS, based on an international multi-round Delphi process.

CVSep 1, 2020
Aggregating Long-Term Context for Learning Laparoscopic and Robot-Assisted Surgical Workflows

Yutong Ban, Guy Rosman, Thomas Ward et al.

Analyzing surgical workflow is crucial for surgical assistance robots to understand surgeries. With the understanding of the complete surgical workflow, the robots are able to assist the surgeons in intra-operative events, such as by giving a warning when the surgeon is entering specific keys or high-risk phases. Deep learning techniques have recently been widely applied to recognizing surgical workflows. Many of the existing temporal neural network models are limited in their capability to handle long-term dependencies in the data, instead, relying upon the strong performance of the underlying per-frame visual models. We propose a new temporal network structure that leverages task-specific network representation to collect long-term sufficient statistics that are propagated by a sufficient statistics model (SSM). We implement our approach within an LSTM backbone for the task of surgical phase recognition and explore several choices for propagated statistics. We demonstrate superior results over existing and novel state-of-the-art segmentation techniques on two laparoscopic cholecystectomy datasets: the publicly available Cholec80 dataset and MGH100, a novel dataset with more challenging and clinically meaningful segment labels.