In-Context Multiple Instance Learning
This work addresses the low-label regime in MIL, a bottleneck for real-world applications like pathology and satellite imagery, by providing a flexible yet robust few-shot learning method.
The paper introduces an in-context learning approach for Multiple Instance Learning (MIL) that uses a Perceiver-style architecture pretrained on synthetic data, enabling classification from a few labeled bags in a single forward pass without gradient updates. The model outperforms supervised baselines across twelve MIL benchmarks, achieving the best average performance.
Multiple Instance Learning (MIL) addresses problems where supervision is available at the level of bags of instances and has been successfully applied in fields ranging from computational pathology to satellite imagery. Nevertheless, existing algorithms struggle in the low-label regime that characterizes many real-world applications. Flexible models overfit and rigid ones fail to adapt to the task at hand. We show that pretraining an in-context learner with a Perceiver-style architecture on synthetic data yields a model that can solve new tasks from a handful of labeled bags. At inference time, classification happens in a single forward pass and requires no gradient updates. We propose and investigate different synthetic data generators for bag-structured data and find that they capture complementary inductive biases. A model pretrained on a mixture of these generators inherits their per-task strengths and achieves the best average performance across twelve MIL benchmarks, outperforming supervised baselines that require task-specific training.