MLLGPRMay 13

State-of-art minibatches via novel DPP kernels: discretization, wavelets, and rough objectives

arXiv:2605.1312723.0
Predicted impact top 65% in ML · last 90 daysOriginality Incremental advance
AI Analysis

This work advances DPP-based subsampling for machine learning by providing new kernels with improved theoretical guarantees and a practical conversion method, though it remains incremental within the existing DPP framework.

The paper introduces new determinantal point processes (DPPs) based on wavelets with provably better accuracy guarantees than existing methods, and a general method to convert continuous DPPs into discrete kernels for efficient minibatch and coreset construction, enabling application to objective functions with low regularity.

Determinantal point processes (DPPs) have emerged as a kernelized alternative to vanilla independent sampling for generating efficient minibatches, coresets and other parsimonious representations of large-scale datasets. While theoretical foundations and promising empirical performance have been demonstrated, there are two challenges for current proposals for DPP-based coresets or minibatches. The first is the need for families of DPPs with certain key variance reduction properties, usually constructed in a continuous setting, of which there are few known examples. The second is the need for an ad-hoc construction of a discrete DPP defined on a given dataset, that inherits such variance reduction. In this work, we contribute to the programme of establishing DPPs as a subsampling toolbox for ML by advancing on these two fronts. First, we propose new DPPs on the Euclidean space based on wavelets, with provably better accuracy guarantees than the best known rates. Second, we introduce a general method to convert such continuous DPPs, which are more amenable to proving analytical statements, into discrete kernels, which are pertinent for subsampling tasks such as minibatch and coreset constructions. This conversion mechanism simultaneously preserves the desired variance decay and reveals a low-rank decomposition of the discrete kernel, which makes sampling the corresponding DPP computationally inexpensive. En route, we enlarge the class of ML tasks amenable to improvements via DPP-based minibatches and coresets to include objective functions with arbitrarily low regularity, and rate guarantees that explicitly adapt to this regularity.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes