LGFeb 2, 2024

Limited Memory Online Gradient Descent for Kernelized Pairwise Learning with Dynamic Averaging

arXiv:2402.01146v11 citationsh-index: 3AAAI
Originality Incremental advance
AI Analysis

This work addresses scalability issues in pairwise learning for applications like metric learning and AUC maximization, offering a more efficient kernel-based approach.

The paper tackles the computational complexity of pairwise learning by introducing a lightweight online gradient descent algorithm that works with kernel models and does not require independent examples, achieving a sub-linear regret bound with O(T) complexity and outperforming existing methods in experiments.

Pairwise learning, an important domain within machine learning, addresses loss functions defined on pairs of training examples, including those in metric learning and AUC maximization. Acknowledging the quadratic growth in computation complexity accompanying pairwise loss as the sample size grows, researchers have turned to online gradient descent (OGD) methods for enhanced scalability. Recently, an OGD algorithm emerged, employing gradient computation involving prior and most recent examples, a step that effectively reduces algorithmic complexity to $O(T)$, with $T$ being the number of received examples. This approach, however, confines itself to linear models while assuming the independence of example arrivals. We introduce a lightweight OGD algorithm that does not require the independence of examples and generalizes to kernel pairwise learning. Our algorithm builds the gradient based on a random example and a moving average representing the past data, which results in a sub-linear regret bound with a complexity of $O(T)$. Furthermore, through the integration of $O(\sqrt{T}{\log{T}})$ random Fourier features, the complexity of kernel calculations is effectively minimized. Several experiments with real-world datasets show that the proposed technique outperforms kernel and linear algorithms in offline and online scenarios.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes