LGOct 21, 2025

Learning Task-Agnostic Representations through Multi-Teacher Distillation

arXiv:2510.18680v13 citationsh-index: 50
Originality Incremental advance
AI Analysis

This work addresses the challenge of creating versatile embeddings for various downstream tasks in fields like text, vision, and molecular modeling, though it appears incremental as it builds on existing multi-teacher distillation methods.

The paper tackles the problem of learning task-agnostic representations by proposing a 'majority vote' objective function for multi-teacher distillation, which eliminates the need for task-specific labels and improves performance across text, vision, and molecular modeling tasks.

Casting complex inputs into tractable representations is a critical step across various fields. Diverse embedding models emerge from differences in architectures, loss functions, input modalities and datasets, each capturing unique aspects of the input. Multi-teacher distillation leverages this diversity to enrich representations but often remains tailored to specific tasks. In this paper, we introduce a task-agnostic framework based on a ``majority vote" objective function. We demonstrate that this function is bounded by the mutual information between student and teachers' embeddings, leading to a task-agnostic distillation loss that eliminates dependence on task-specific labels or prior knowledge. Our evaluations across text, vision models, and molecular modeling show that our method effectively leverages teacher diversity, resulting in representations enabling better performance for a wide range of downstream tasks such as classification, clustering, or regression. Additionally, we train and release state-of-the-art embedding models, enhancing downstream performance in various modalities.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes