Learning Mixed Multinomial Logit Model from Ordinal Data
This work addresses a long-standing challenge in generating personalized recommendations from preference data, with applications in social choice, operations research, and revenue management, though it is incremental as it builds on existing tensor decomposition and Rank Centrality methods.
The paper tackles the problem of learning a mixture of Multinomial Logit (MNL) models from partial ordinal data, such as pairwise comparisons, which was previously infeasible in general. It presents a sufficient condition and an efficient algorithm that can learn a mixture of r MNL components over n objects with sample size scaling polynomially as r^{3.5}n^3(log n)^4, under incoherence constraints.
Motivated by generating personalized recommendations using ordinal (or preference) data, we study the question of learning a mixture of MultiNomial Logit (MNL) model, a parameterized class of distributions over permutations, from partial ordinal or preference data (e.g. pair-wise comparisons). Despite its long standing importance across disciplines including social choice, operations research and revenue management, little is known about this question. In case of single MNL models (no mixture), computationally and statistically tractable learning from pair-wise comparisons is feasible. However, even learning mixture with two MNL components is infeasible in general. Given this state of affairs, we seek conditions under which it is feasible to learn the mixture model in both computationally and statistically efficient manner. We present a sufficient condition as well as an efficient algorithm for learning mixed MNL models from partial preferences/comparisons data. In particular, a mixture of $r$ MNL components over $n$ objects can be learnt using samples whose size scales polynomially in $n$ and $r$ (concretely, $r^{3.5}n^3(log n)^4$, with $r\ll n^{2/7}$ when the model parameters are sufficiently incoherent). The algorithm has two phases: first, learn the pair-wise marginals for each component using tensor decomposition; second, learn the model parameters for each component using Rank Centrality introduced by Negahban et al. In the process of proving these results, we obtain a generalization of existing analysis for tensor decomposition to a more realistic regime where only partial information about each sample is available.