LG AISep 26, 2025

SubZeroCore: A Submodular Approach with Zero Training for Coreset Selection

Brian B. Moser, Tobias C. Nauen, Arundhati S. Shanbhag, Federico Raue, Stanislav Frolov, Joachim Folz, Andreas Dengel

arXiv:2509.21748v1h-index: 13

Originality Highly original

AI Analysis

This addresses the problem of inefficient coreset selection for machine learning practitioners by offering a novel, training-free approach that is scalable and robust, representing a significant advancement rather than an incremental improvement.

The paper tackles the inefficiency of existing coreset selection methods that require expensive training signals by introducing SubZeroCore, a training-free method that matches or outperforms training-based baselines, especially at high pruning rates, while reducing computational overhead and showing robustness to label noise.

The goal of coreset selection is to identify representative subsets of datasets for efficient model training. Yet, existing approaches paradoxically require expensive training-based signals, e.g., gradients, decision boundary estimates or forgetting counts, computed over the entire dataset prior to pruning, which undermines their very purpose by requiring training on samples they aim to avoid. We introduce SubZeroCore, a novel, training-free coreset selection method that integrates submodular coverage and density into a single, unified objective. To achieve this, we introduce a sampling strategy based on a closed-form solution to optimally balance these objectives, guided by a single hyperparameter that explicitly controls the desired coverage for local density measures. Despite no training, extensive evaluations show that SubZeroCore matches training-based baselines and significantly outperforms them at high pruning rates, while dramatically reducing computational overhead. SubZeroCore also demonstrates superior robustness to label noise, highlighting its practical effectiveness and scalability for real-world scenarios.

View on arXiv PDF

Similar