Thibaut Cuvelier

3papers

34citations

Novelty60%

AI Score42

Ranked #84,280 of 201,326 authors (top 42%)#1,015 in ML (top 29%)

3 Papers

MLMay 4

Middle-mile logistics through the lens of goal-conditioned reinforcement learning

Onno Eberhard, Thibaut Cuvelier, Michal Valko et al.

Middle-mile logistics describes the problem of routing parcels through a network of hubs linked by trucks with finite capacity. We rephrase this as a multi-object goal-conditioned MDP. Our method combines graph neural networks with model-free RL, extracting small feature graphs from the environment state.

MLFeb 14, 2021

Asymptotically Optimal Strategies For Combinatorial Semi-Bandits in Polynomial Time

Thibaut Cuvelier, Richard Combes, Eric Gourdin

We consider combinatorial semi-bandits with uncorrelated Gaussian rewards. In this article, we propose the first method, to the best of our knowledge, that enables to compute the solution of the Graves-Lai optimization problem in polynomial time for many combinatorial structures of interest. In turn, this immediately yields the first known approach to implement asymptotically optimal algorithms in polynomial time for combinatorial semi-bandits.

MLFeb 17, 2020

Statistically Efficient, Polynomial Time Algorithms for Combinatorial Semi Bandits

Thibaut Cuvelier, Richard Combes, Eric Gourdin

We consider combinatorial semi-bandits over a set of arms ${\cal X} \subset \{0,1\}^d$ where rewards are uncorrelated across items. For this problem, the algorithm ESCB yields the smallest known regret bound $R(T) = {\cal O}\Big( {d (\ln m)^2 (\ln T) \over Δ_{\min} }\Big)$, but it has computational complexity ${\cal O}(|{\cal X}|)$ which is typically exponential in $d$, and cannot be used in large dimensions. We propose the first algorithm which is both computationally and statistically efficient for this problem with regret $R(T) = {\cal O} \Big({d (\ln m)^2 (\ln T)\over Δ_{\min} }\Big)$ and computational complexity ${\cal O}(T {\bf poly}(d))$. Our approach involves carefully designing an approximate version of ESCB with the same regret guarantees, showing that this approximate algorithm can be implemented in time ${\cal O}(T {\bf poly}(d))$ by repeatedly maximizing a linear function over ${\cal X}$ subject to a linear budget constraint, and showing how to solve this maximization problems efficiently.