IVAICVLGApr 23, 2024

Interactive Generation of Laparoscopic Videos with Diffusion Models

arXiv:2406.06537v113 citationsh-index: 7DGM4MICCAI@MICCAI
Originality Synthesis-oriented
AI Analysis

This work addresses the need for more efficient and practical surgical training methods, though it is incremental as it applies existing diffusion techniques to a specific domain.

The paper tackles the problem of generating realistic laparoscopic videos for surgical training by using diffusion models with text and segmentation mask guidance, achieving an FID of 38.097 and an F1-score of 0.71.

Generative AI, in general, and synthetic visual data generation, in specific, hold much promise for benefiting surgical training by providing photorealism to simulation environments. Current training methods primarily rely on reading materials and observing live surgeries, which can be time-consuming and impractical. In this work, we take a significant step towards improving the training process. Specifically, we use diffusion models in combination with a zero-shot video diffusion method to interactively generate realistic laparoscopic images and videos by specifying a surgical action through text and guiding the generation with tool positions through segmentation masks. We demonstrate the performance of our approach using the publicly available Cholec dataset family and evaluate the fidelity and factual correctness of our generated images using a surgical action recognition model as well as the pixel-wise F1-score for the spatial control of tool generation. We achieve an FID of 38.097 and an F1-score of 0.71.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes