LGOCMay 11

Signature Approach for Contextual Bandits with Nonlinear and Path-dependent Rewards

arXiv:2605.1031332.4
Predicted impact top 71% in LG · last 90 daysOriginality Incremental advance
AI Analysis

This work provides a practical framework for sequential decision-making with complex reward structures, relevant to applications like sensor monitoring and healthcare.

The paper introduces a signature-transform-based approach for contextual bandits with nonlinear and path-dependent rewards, enabling linear methods via signature features. The proposed DisSigUCB algorithm achieves sublinear regret and outperforms baselines in experiments.

We study contextual bandits with nonlinear and path-dependent rewards through a novel signature-transform-based approach. Leveraging the universal nonlinearity property of signatures, we approximate continuous path-dependent reward functionals by linear functionals in the signature space. This representation enables the use of efficient linear contextual bandit methods while preserving expressive sequential structure. Building on this framework, we propose \texttt{DisSigUCB}, a signature-based disjoint upper confidence bound (UCB) algorithm. Under boundedness and non-degeneracy assumptions, we prove a high-probability data-dependent sublinear regret bound of order \(\tilde{\mathcal O}(\sqrt{(d+m)KT})\) where \(d\) is the context dimension and \(m\) is the signature feature dimension. Synthetic experiments and numerical applications on temperature sensor monitoring, sleep-stage classification, and hospital nurse staffing demonstrate that \texttt{DisSigUCB} consistently outperforms classical linear and kernelized contextual bandit baselines in nonlinear and path-dependent settings.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes