DenseTRF: Texture-Aware Unsupervised Representation Adaptation for Surgical Scene Dense Prediction
This work improves robustness of dense prediction models for surgical computer vision, a domain-specific problem where training data variability is limited.
DenseTRF addresses domain shift in surgical scene dense prediction by using texture-aware representations learned via slot attention, achieving improved cross-distribution generalization over state-of-the-art methods.
Dense prediction tasks in surgical computer vision, such as segmentation and surgical zone prediction, can provide valuable guidance for laparoscopic and robotic surgery. However, these models often suffer from distribution shifts, as training datasets rarely cover the variability encountered during deployment, leading to poor generalization. We propose DenseTRF, a self-supervised representation adaptation framework based on texture-centric attention. Our method leverages slot attention to learn texture-aware representations that capture invariant visual structures. By adapting these representations to the target distribution without supervision, DenseTRF significantly improves robustness to domain shifts. The framework is implemented through conditioning dense prediction on slot attention and model merging strategies. Experiments across multiple surgical procedures demonstrate improved cross-distribution generalization in comparison to state-of-the-art segmentation models and test-distribution adaptation methods for dense prediction tasks.