LGOct 29, 2025

Synthetic Data Reveals Generalization Gaps in Correlated Multiple Instance Learning

Ethan Harvey, Dennis Johan Loevlie, Michael C. Hughes

arXiv:2510.25759v21 citationsh-index: 3

Originality Incremental advance

AI Analysis

This work addresses a critical limitation in MIL for medical imaging, where ignoring correlations can lead to suboptimal predictions, though it is incremental as it builds on existing correlated MIL methods.

The paper tackled the problem of conventional multiple instance learning (MIL) methods ignoring contextual relationships between instances, such as nearby patches or slices, by designing a synthetic classification task where accounting for adjacent features is crucial. They demonstrated that off-the-shelf MIL approaches and newer correlated methods fail to achieve optimal performance, with empirical results showing gaps even with ten thousand training samples.

Multiple instance learning (MIL) is often used in medical imaging to classify high-resolution 2D images by processing patches or classify 3D volumes by processing slices. However, conventional MIL approaches treat instances separately, ignoring contextual relationships such as the appearance of nearby patches or slices that can be essential in real applications. We design a synthetic classification task where accounting for adjacent instance features is crucial for accurate prediction. We demonstrate the limitations of off-the-shelf MIL approaches by quantifying their performance compared to the optimal Bayes estimator for this task, which is available in closed-form. We empirically show that newer correlated MIL methods still do not achieve the best possible performance when trained with ten thousand training samples, each containing many instances.

View on arXiv PDF

Similar