IVCVJan 21, 2025

Slot-BERT: Self-supervised Object Discovery in Surgical Video

arXiv:2501.12477v33 citationsh-index: 12Medical Image Analysis
Originality Incremental advance
AI Analysis

This addresses the challenge of scalable and coherent object discovery in long surgical videos for medical applications, representing an incremental improvement over existing methods.

The paper tackled the problem of maintaining long-range temporal coherence in unsupervised object discovery for surgical videos, and the result was Slot-BERT, which surpassed state-of-the-art object-centric approaches and enabled efficient zero-shot domain adaptation across diverse surgical specialties.

Object-centric slot attention is a powerful framework for unsupervised learning of structured and explainable representations that can support reasoning about objects and actions, including in surgical videos. While conventional object-centric methods for videos leverage recurrent processing to achieve efficiency, they often struggle with maintaining long-range temporal coherence required for long videos in surgical applications. On the other hand, fully parallel processing of entire videos enhances temporal consistency but introduces significant computational overhead, making it impractical for implementation on hardware in medical facilities. We present Slot-BERT, a bidirectional long-range model that learns object-centric representations in a latent space while ensuring robust temporal coherence. Slot-BERT scales object discovery seamlessly to long videos of unconstrained lengths. A novel slot contrastive loss further reduces redundancy and improves the representation disentanglement by enhancing slot orthogonality. We evaluate Slot-BERT on real-world surgical video datasets from abdominal, cholecystectomy, and thoracic procedures. Our method surpasses state-of-the-art object-centric approaches under unsupervised training achieving superior performance across diverse domains. We also demonstrate efficient zero-shot domain adaptation to data from diverse surgical specialties and databases.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes