CV LGDec 7, 2025

Hide-and-Seek Attribution: Weakly Supervised Segmentation of Vertebral Metastases in CT

Matan Atad, Alexander W. Marka, Lisa Steinhelfer, Anna Curto-Vilalta, Yannik Leonhardt, Sarah C. Foreman, Anna-Sophia Walburga Dietrich, Robert Graf, Alexandra S. Gersing, Bjoern Menze, Daniel Rueckert, Jan S. Kirschke

arXiv:2512.06849v1h-index: 69

Originality Incremental advance

AI Analysis

This addresses the challenge of scaling accurate segmentation for vertebral metastases in medical imaging, which is clinically important but limited by scarce annotations, though it is incremental as it builds on existing weakly supervised and generative methods.

The paper tackled the problem of segmenting vertebral metastases in CT scans without voxel-level annotations by introducing a weakly supervised method using only vertebra-level labels, achieving strong performance with F1 scores of 0.91 for blastic and 0.85 for lytic lesions.

Accurate segmentation of vertebral metastasis in CT is clinically important yet difficult to scale, as voxel-level annotations are scarce and both lytic and blastic lesions often resemble benign degenerative changes. We introduce a weakly supervised method trained solely on vertebra-level healthy/malignant labels, without any lesion masks. The method combines a Diffusion Autoencoder (DAE) that produces a classifier-guided healthy edit of each vertebra with pixel-wise difference maps that propose candidate lesion regions. To determine which regions truly reflect malignancy, we introduce Hide-and-Seek Attribution: each candidate is revealed in turn while all others are hidden, the edited image is projected back to the data manifold by the DAE, and a latent-space classifier quantifies the isolated malignant contribution of that component. High-scoring regions form the final lytic or blastic segmentation. On held-out radiologist annotations, we achieve strong blastic/lytic performance despite no mask supervision (F1: 0.91/0.85; Dice: 0.87/0.78), exceeding baselines (F1: 0.79/0.67; Dice: 0.74/0.55). These results show that vertebra-level labels can be transformed into reliable lesion masks, demonstrating that generative editing combined with selective occlusion supports accurate weakly supervised segmentation in CT.

View on arXiv PDF

Similar