Marco Scarsini

4papers

6citations

Novelty51%

AI Score37

Ranked #115,348 of 201,326 authors (top 57%)#267 in GT (top 69%)

4 Papers

GTJul 8, 2022

Online Learning in Supply-Chain Games

Nicolò Cesa-Bianchi, Tommaso Cesari, Takayuki Osogami et al.

We study a repeated game between a supplier and a retailer who want to maximize their respective profits without full knowledge of the problem parameters. After characterizing the uniqueness of the Stackelberg equilibrium of the stage game with complete information, we show that even with partial knowledge of the joint distribution of demand and production costs, natural learning dynamics guarantee convergence of the joint strategy profile of supplier and retailer to the Stackelberg equilibrium of the stage game. We also prove finite-time bounds on the supplier's regret and asymptotic bounds on the retailer's regret, where the specific rates depend on the type of knowledge preliminarily available to the players. In the special case when the supplier is not strategic (vertical integration), we prove optimal finite-time regret bounds on the retailer's regret (or, equivalently, the social welfare) when costs and demand are adversarially generated and the demand is censored.

GTMar 13

Dynamic Wholesale Pricing under Censored-Demand Learning

Michalis Deligiannis, Marco Scarsini, Xavier Venel

We study a finite-horizon dynamic wholesale-price contract between a manufacturer and a retailer, both of whom observe only sales, rather than the true demand. When the retailer stocks out, unmet demand is unobserved, so both parties update a common posterior over the demand distribution from sales data. Each period, the manufacturer sets the wholesale price, the retailer chooses an order quantity, and the public belief state is updated. We characterize Markov perfect equilibria as functions of this public belief. Our main results are as follows: for Weibull demand, we extend the well-known scaling approach to this strategic learning setting, prove the existence of an equilibrium, and reduce computation to a standardized one-parameter recursion; for exponential demand, we show that the equilibrium is unique and computable via a simple backward recursion.

MLFeb 16, 2021

Making the most of your day: online learning for optimal allocation of time

Etienne Boursier, Tristan Garrec, Vianney Perchet et al.

We study online learning for optimal allocation when the resource to be allocated is time. %Examples of possible applications include job scheduling for a computing server, a driver filling a day with rides, a landlord renting an estate, etc. An agent receives task proposals sequentially according to a Poisson process and can either accept or reject a proposed task. If she accepts the proposal, she is busy for the duration of the task and obtains a reward that depends on the task duration. If she rejects it, she remains on hold until a new task proposal arrives. We study the regret incurred by the agent, first when she knows her reward function but does not know the distribution of the task duration, and then when she does not know her reward function, either. This natural setting bears similarities with contextual (one-armed) bandits, but with the crucial difference that the normalized reward associated to a context depends on the whole distribution of contexts.

OCJul 20, 2020

Social Learning in Non-Stationary Environments

Etienne Boursier, Vianney Perchet, Marco Scarsini

Potential buyers of a product or service, before making their decisions, tend to read reviews written by previous consumers. We consider Bayesian consumers with heterogeneous preferences, who sequentially decide whether to buy an item of unknown quality, based on previous buyers' reviews. The quality is multi-dimensional and may occasionally vary over time; the reviews are also multi-dimensional. In the simple uni-dimensional and static setting, beliefs about the quality are known to converge to its true value. Our paper extends this result in several ways. First, a multi-dimensional quality is considered, second, rates of convergence are provided, third, a dynamical Markovian model with varying quality is studied. In this dynamical setting the cost of learning is shown to be small.