Conquering the Retina: Bringing Visual in-Context Learning to OCT
This work addresses the need for flexible, task-agnostic models in medical imaging, specifically for retinal OCT, though it is incremental as it builds on existing VICL methods.
The paper tackled the challenge of applying visual in-context learning to retinal optical coherence tomography (OCT) to enable generalist models that adapt to tasks on the fly, establishing a first baseline with extensive evaluation on multiple datasets.
Recent advancements in medical image analysis have led to the development of highly specialized models tailored to specific clinical tasks. These models have demonstrated exceptional performance and remain a crucial research direction. Yet, their applicability is limited to predefined tasks, requiring expertise and extensive resources for development and adaptation. In contrast, generalist models offer a different form of utility: allowing medical practitioners to define tasks on the fly without the need for task-specific model development. In this work, we explore how to train generalist models for the domain of retinal optical coherence tomography using visual in-context learning (VICL), i.e., training models to generalize across tasks based on a few examples provided at inference time. To facilitate rigorous assessment, we propose a broad evaluation protocol tailored to VICL in OCT. We extensively evaluate a state-of-the-art medical VICL approach on multiple retinal OCT datasets, establishing a first baseline to highlight the potential and current limitations of in-context learning for OCT. To foster further research and practical adoption, we openly release our code.