LG MLMay 17

Learning in Position-Aware Multinomial Logit Bandits: From Multiplicative to General Position Effects

Xi Chen, Shibo Dai, Jiameng Lyu, Yuan Zhou

arXiv:2605.1723881.9

Predicted impact top 15% in LG · last 90 daysOriginality Highly original

AI Analysis

This work provides the first regret-optimal algorithms for joint assortment and positioning in MNL bandits, addressing a practical problem for online platforms with prompt decision needs.

This paper studies dynamic assortment selection and positioning under MNL choice models, achieving the first regret-optimal algorithms for both multiplicative and general position effects. For the multiplicative model, P2MLE-UCB attains Õ(√(NT)) regret, matching the lower bound and closing a √K gap, while for the general model, GP2-UCB achieves matching upper and lower bounds.

We study the dynamic joint assortment selection and positioning problem, where the attraction of each product depends on both its intrinsic appeal and its display position under a Multinomial Logit (MNL) choice framework. Our study ranges from the multiplicative position effects model, in which each product's attraction is scaled by a position-specific factor, to a general position effects model assigning independent attraction parameters to every product--position pair to capture heterogeneous synergies. For both models, we design round-based learning algorithms that update decisions after every single feedback, and establish the first regret-optimal characterization. Besides, our round-based algorithms provide the prompt operations needed by modern platforms. For the multiplicative model, we develop a cross-position pairwise maximum likelihood estimator with a clipping mechanism, and prove that our algorithm P2MLE-UCB attains a regret of $\tilde{O}(\sqrt{NT})$, matching the lower bound and closing the $\sqrt{K}$ gap left by prior epoch-based analyses. For the general model, we establish a minimax lower bound and propose GP2-UCB with a matching upper bound. Moreover, we design an efficient subroutine for the per-round joint assortment and positioning optimization based on Dinkelbach's method and maximum-weight bipartite matching. Numerical experiments on synthetic data and the Expedia dataset show that our algorithms consistently outperform state-of-the-art benchmarks.

View on arXiv PDF

Similar