LGFeb 2

Self-Soupervision: Cooking Model Soups without Labels

Anthony Fuller, James R. Green, Evan Shelhamer

arXiv:2602.02890v11.4

Originality Incremental advance

AI Analysis

This work addresses the need for more robust and adaptable model soups in machine learning, particularly for self-supervised learning scenarios, though it is incremental by building on existing supervised soup methods.

The paper tackles the problem of extending model soups to self-supervised learning, enabling parameter mixing from models trained with different SSL algorithms or hyperparameters, and shows that this approach improves robustness by +3.5% on ImageNet-C and +7% on LAION-C.

Model soups are strange and strangely effective combinations of parameters. They take a model (the stock), fine-tune it into multiple models (the ingredients), and then mix their parameters back into one model (the soup) to improve predictions. While all known soups require supervised learning, and optimize the same loss on labeled data, our recipes for Self-\emph{Soup}ervision generalize soups to self-supervised learning (SSL). Our Self-Souping lets us flavor ingredients on new data sources, e.g. from unlabeled data from a task for transfer or from a shift for robustness. We show that Self-Souping on corrupted test data, then fine-tuning back on uncorrupted train data, boosts robustness by +3.5\% (ImageNet-C) and +7\% (LAION-C). Self-\emph{Soup}ervision also unlocks countless SSL algorithms to cook the diverse ingredients needed for more robust soups. We show for the first time that ingredients can differ in their SSL hyperparameters -- and more surprisingly, in their SSL algorithms. We cook soups of MAE, MoCoV3, and MMCR ingredients that are more accurate than any one single SSL ingredient.

View on arXiv PDF

Similar