Learning to Bid in Contextual First Price Auctions
This work addresses the challenge of optimal bidding in online auctions for a single learner, offering incremental improvements in regret bounds for contextual settings.
The paper tackles the problem of how to bid in repeated contextual first price auctions by proposing algorithms for both binary and full information feedback settings, achieving regret bounds of Õ(√(log(d) T)) and Õ(√(dT)) respectively, with a lower bound of Ω(√T).
In this paper, we investigate the problem about how to bid in repeated contextual first price auctions. We consider a single bidder (learner) who repeatedly bids in the first price auctions: at each time $t$, the learner observes a context $x_t\in \mathbb{R}^d$ and decides the bid based on historical information and $x_t$. We assume a structured linear model of the maximum bid of all the others $m_t = α_0\cdot x_t + z_t$, where $α_0\in \mathbb{R}^d$ is unknown to the learner and $z_t$ is randomly sampled from a noise distribution $\mathcal{F}$ with log-concave density function $f$. We consider both \emph{binary feedback} (the learner can only observe whether she wins or not) and \emph{full information feedback} (the learner can observe $m_t$) at the end of each time $t$. For binary feedback, when the noise distribution $\mathcal{F}$ is known, we propose a bidding algorithm, by using maximum likelihood estimation (MLE) method to achieve at most $\widetilde{O}(\sqrt{\log(d) T})$ regret. Moreover, we generalize this algorithm to the setting with binary feedback and the noise distribution is unknown but belongs to a parametrized family of distributions. For the full information feedback with \emph{unknown} noise distribution, we provide an algorithm that achieves regret at most $\widetilde{O}(\sqrt{dT})$. Our approach combines an estimator for log-concave density functions and then MLE method to learn the noise distribution $\mathcal{F}$ and linear weight $α_0$ simultaneously. We also provide a lower bound result such that any bidding policy in a broad class must achieve regret at least $Ω(\sqrt{T})$, even when the learner receives the full information feedback and $\mathcal{F}$ is known.