Decoding the Surgical Scene: A Scoping Review of Scene Graphs in Surgery
It addresses the problem of integrating scene graphs into surgical AI for improved safety and training, but it is incremental as it reviews existing research rather than proposing new methods.
This scoping review maps the use of scene graphs in surgery, identifying a 'data divide' where real-world 2D video dominates internal-view tasks while simulated data is used for external-view 4D modeling, and notes that specialized foundation models now outperform generalist models in surgical contexts.
Scene graphs (SGs) provide structured relational representations crucial for decoding complex, dynamic surgical environments. This PRISMA-ScR-guided scoping review systematically maps the evolving landscape of SG research in surgery, charting its applications, methodological advancements, and future directions. Our analysis reveals rapid growth, yet uncovers a critical 'data divide': internal-view research (e.g., triplet recognition) almost exclusively uses real-world 2D video, while external-view 4D modeling relies heavily on simulated data, exposing a key translational research gap. Methodologically, the field has advanced from foundational graph neural networks to specialized foundation models that now significantly outperform generalist large vision-language models in surgical contexts. This progress has established SGs as a cornerstone technology for both analysis, such as workflow recognition and automated safety monitoring, and generative tasks like controllable surgical simulation. Although challenges in data annotation and real-time implementation persist, they are actively being addressed through emerging techniques. Surgical SGs are maturing into an essential semantic bridge, enabling a new generation of intelligent systems to improve surgical safety, efficiency, and training.