Dominik Schneider

CV
3papers
Novelty43%
AI Score36

3 Papers

CVFeb 24
Towards Controllable Video Synthesis of Routine and Rare OR Events

Dominik Schneider, Lalithkumar Seenivasan, Sampath Rapuri et al.

Purpose: Curating large-scale datasets of operating room (OR) workflow, encompassing rare, safety-critical, or atypical events, remains operationally and ethically challenging. This data bottleneck complicates the development of ambient intelligence for detecting, understanding, and mitigating rare or safety-critical events in the OR. Methods: This work presents an OR video diffusion framework that enables controlled synthesis of rare and safety-critical events. The framework integrates a geometric abstraction module, a conditioning module, and a fine-tuned diffusion model to first transform OR scenes into abstract geometric representations, then condition the synthesis process, and finally generate realistic OR event videos. Using this framework, we also curate a synthetic dataset to train and validate AI models for detecting near-misses of sterile-field violations. Results: In synthesizing routine OR events, our method outperforms off-the-shelf video diffusion baselines, achieving lower FVD/LPIPS and higher SSIM/PSNR in both in- and out-of-domain datasets. Through qualitative results, we illustrate its ability for controlled video synthesis of counterfactual events. An AI model trained and validated on the generated synthetic data achieved a RECALL of 70.13% in detecting near safety-critical events. Finally, we conduct an ablation study to quantify performance gains from key design choices. Conclusion: Our solution enables controlled synthesis of routine and rare OR events from abstract geometric representations. Beyond demonstrating its capability to generate rare and safety-critical scenarios, we show its potential to support the development of ambient intelligence models.

65.7CVMar 13
SAW: Toward a Surgical Action World Model via Controllable and Scalable Video Generation

Sampath Rapuri, Lalithkumar Seenivasan, Dominik Schneider et al.

A surgical world model capable of generating realistic surgical action videos with precise control over tool-tissue interactions can address fundamental challenges in surgical AI and simulation -- from data scarcity and rare event synthesis to bridging the sim-to-real gap for surgical automation. However, current video generation methods, the very core of such surgical world models, require expensive annotations or complex structured intermediates as conditioning signals at inference, limiting their scalability. Other approaches exhibit limited temporal consistency across complex laparoscopic scenes and do not possess sufficient realism. We propose Surgical Action World (SAW) -- a step toward surgical action world modeling through video diffusion conditioned on four lightweight signals: language prompts encoding tool-action context, a reference surgical scene, tissue affordance mask, and 2D tool-tip trajectories. We design a conditional video diffusion approach that reformulates video-to-video diffusion into trajectory-conditioned surgical action synthesis. The backbone diffusion model is fine-tuned on a custom-curated dataset of 12,044 laparoscopic clips with lightweight spatiotemporal conditioning signals, leveraging a depth consistency loss to enforce geometric plausibility without requiring depth at inference. SAW achieves state-of-the-art temporal consistency (CD-FVD: 199.19 vs. 546.82) and strong visual quality on held-out test data. Furthermore, we demonstrate its downstream utility for (a) surgical AI, where augmenting rare actions with SAW-generated videos improves action recognition (clipping F1-score: 20.93% to 43.14%; cutting: 0.00% to 8.33%) on real test data, and (b) surgical simulation, where rendering tool-tissue interaction videos from simulator-derived trajectory points toward a visually faithful simulation engine.

SENov 30, 2021
Towards Automated Semantic Grouping in Workflows for Multi-Disciplinary Analysis

Dominik Schneider, Alexander Weinert

When designing multidisciplinary tool workflows in visual development environments, researchers and engineers often combine simulation tools which serve a functional purpose and helper tools that merely ensure technical compatibility by, e.g., converting between file formats. If the development environment does not offer native support for such groups of tools, maintainability of the developed workflow quickly deteriorates with an increase in complexity. We present an approach towards automatically identifying such groups of closely related tools in multidisciplinary workflows implemented in RCE by transforming the workflow into a graph and applying graph clustering algorithms to it. Further, we implement this approach and evaluate multiple clustering algorithms. Our results strongly indicate that this approach can yield groups of closely related tools in RCE workflows, but also that solutions to this problem will have to be tailor-made to each specific style of workflow design.