CVDec 8, 2022

Latent Graph Representations for Critical View of Safety Assessment

arXiv:2212.04155v448 citationsh-index: 54
Originality Incremental advance
AI Analysis

This work addresses the challenge of accurate and cost-effective surgical safety assessment for medical professionals, representing an incremental improvement over prior methods by reducing annotation requirements.

The paper tackled the problem of assessing the critical view of safety in laparoscopic cholecystectomy by proposing a method that uses disentangled latent scene graphs and graph neural networks to encode semantic and visual features, achieving state-of-the-art performance with reduced reliance on expensive segmentation annotations.

Assessing the critical view of safety in laparoscopic cholecystectomy requires accurate identification and localization of key anatomical structures, reasoning about their geometric relationships to one another, and determining the quality of their exposure. Prior works have approached this task by including semantic segmentation as an intermediate step, using predicted segmentation masks to then predict the CVS. While these methods are effective, they rely on extremely expensive ground-truth segmentation annotations and tend to fail when the predicted segmentation is incorrect, limiting generalization. In this work, we propose a method for CVS prediction wherein we first represent a surgical image using a disentangled latent scene graph, then process this representation using a graph neural network. Our graph representations explicitly encode semantic information - object location, class information, geometric relations - to improve anatomy-driven reasoning, as well as visual features to retain differentiability and thereby provide robustness to semantic errors. Finally, to address annotation cost, we propose to train our method using only bounding box annotations, incorporating an auxiliary image reconstruction objective to learn fine-grained object boundaries. We show that our method not only outperforms several baseline methods when trained with bounding box annotations, but also scales effectively when trained with segmentation masks, maintaining state-of-the-art performance.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes