LGOct 21, 2025

Learning Task-Agnostic Representations through Multi-Teacher Distillation

Philippe Formont, Maxime Darrin, Banafsheh Karimian, Jackie CK Cheung, Eric Granger, Ismail Ben Ayed, Mohammadhadi Shateri, Pablo Piantanida

arXiv:2510.18680v19.43 citationsh-index: 50

Originality Incremental advance

AI Analysis

This work addresses the challenge of creating versatile embeddings for various downstream tasks in fields like text, vision, and molecular modeling, though it appears incremental as it builds on existing multi-teacher distillation methods.

The paper tackles the problem of learning task-agnostic representations by proposing a 'majority vote' objective function for multi-teacher distillation, which eliminates the need for task-specific labels and improves performance across text, vision, and molecular modeling tasks.

Casting complex inputs into tractable representations is a critical step across various fields. Diverse embedding models emerge from differences in architectures, loss functions, input modalities and datasets, each capturing unique aspects of the input. Multi-teacher distillation leverages this diversity to enrich representations but often remains tailored to specific tasks. In this paper, we introduce a task-agnostic framework based on a ``majority vote" objective function. We demonstrate that this function is bounded by the mutual information between student and teachers' embeddings, leading to a task-agnostic distillation loss that eliminates dependence on task-specific labels or prior knowledge. Our evaluations across text, vision models, and molecular modeling show that our method effectively leverages teacher diversity, resulting in representations enabling better performance for a wide range of downstream tasks such as classification, clustering, or regression. Additionally, we train and release state-of-the-art embedding models, enhancing downstream performance in various modalities.

View on arXiv PDF

Similar