Distribution Regression for Sequential Data
This work addresses the problem of learning from grouped sequential data for researchers and practitioners in fields like thermodynamics, finance, and agriculture, representing a novel method for a known bottleneck.
The paper tackles distribution regression for sequential data by developing two new learning techniques based on expected signatures and signature kernels, achieving state-of-the-art performance on synthetic and real-world datasets from thermodynamics, finance, and agriculture.
Distribution regression refers to the supervised learning problem where labels are only available for groups of inputs instead of individual inputs. In this paper, we develop a rigorous mathematical framework for distribution regression where inputs are complex data streams. Leveraging properties of the expected signature and a recent signature kernel trick for sequential data from stochastic analysis, we introduce two new learning techniques, one feature-based and the other kernel-based. Each is suited to a different data regime in terms of the number of data streams and the dimensionality of the individual streams. We provide theoretical results on the universality of both approaches and demonstrate empirically their robustness to irregularly sampled multivariate time-series, achieving state-of-the-art performance on both synthetic and real-world examples from thermodynamics, mathematical finance and agricultural science.