Sketch In, Sketch Out: Accelerating both Learning and Inference for Structured Prediction with Kernels
This addresses scalability for researchers and practitioners using structured prediction on complex datasets, but it is incremental as it builds on existing kernel methods.
The paper tackles the scalability issue of surrogate kernel methods for structured prediction by introducing sketching-based approximations for both input and output feature maps, achieving state-of-the-art performance on benchmark datasets where non-sketched methods are intractable.
Leveraging the kernel trick in both the input and output spaces, surrogate kernel methods are a flexible and theoretically grounded solution to structured output prediction. If they provide state-of-the-art performance on complex data sets of moderate size (e.g., in chemoinformatics), these approaches however fail to scale. We propose to equip surrogate kernel methods with sketching-based approximations, applied to both the input and output feature maps. We prove excess risk bounds on the original structured prediction problem, showing how to attain close-to-optimal rates with a reduced sketch size that depends on the eigendecay of the input/output covariance operators. From a computational perspective, we show that the two approximations have distinct but complementary impacts: sketching the input kernel mostly reduces training time, while sketching the output kernel decreases the inference time. Empirically, our approach is shown to scale, achieving state-of-the-art performance on benchmark data sets where non-sketched methods are intractable.