CVOct 30, 2025

FlexICL: A Flexible Visual In-context Learning Framework for Elbow and Wrist Ultrasound Segmentation

Yuyue Zhou, Jessica Knight, Shrimanti Ghosh, Banafshe Felfeliyan, Jacob L. Jaremko, Abhilash R. Hareendranathan

arXiv:2510.26049v1h-index: 14

Originality Incremental advance

AI Analysis

This addresses the problem of reducing annotation costs for medical professionals in pediatric fracture diagnosis using ultrasound imaging, though it appears incremental as it builds on existing visual in-context learning methods.

The paper tackles the problem of automatic segmentation of bony regions in elbow and wrist ultrasound images, which is challenging due to costly expert annotations, and proposes FlexICL, a flexible in-context learning framework that achieves robust segmentation performance while requiring only 5% of training images, outperforming state-of-the-art models by 1-27% Dice coefficient on 1,252 ultrasound sweeps.

Elbow and wrist fractures are the most common fractures in pediatric populations. Automatic segmentation of musculoskeletal structures in ultrasound (US) can improve diagnostic accuracy and treatment planning. Fractures appear as cortical defects but require expert interpretation. Deep learning (DL) can provide real-time feedback and highlight key structures, helping lightly trained users perform exams more confidently. However, pixel-wise expert annotations for training remain time-consuming and costly. To address this challenge, we propose FlexICL, a novel and flexible in-context learning (ICL) framework for segmenting bony regions in US images. We apply it to an intra-video segmentation setting, where experts annotate only a small subset of frames, and the model segments unseen frames. We systematically investigate various image concatenation techniques and training strategies for visual ICL and introduce novel concatenation methods that significantly enhance model performance with limited labeled data. By integrating multiple augmentation strategies, FlexICL achieves robust segmentation performance across four wrist and elbow US datasets while requiring only 5% of the training images. It outperforms state-of-the-art visual ICL models like Painter, MAE-VQGAN, and conventional segmentation models like U-Net and TransUNet by 1-27% Dice coefficient on 1,252 US sweeps. These initial results highlight the potential of FlexICL as an efficient and scalable solution for US image segmentation well suited for medical imaging use cases where labeled data is scarce.

View on arXiv PDF

Similar