LGMay 30, 2025

Learning Distributions over Permutations and Rankings with Factorized Representations

arXiv:2505.24664v11 citationsh-index: 11
Originality Highly original
AI Analysis

This work addresses a fundamental problem in machine learning for applications such as ranking and combinatorial optimization, offering a more flexible and efficient method compared to existing approaches.

The authors tackled the problem of learning probability distributions over permutations by proposing a novel approach using alternative representations like Lehmer codes, which allows unconstrained learning with conventional deep learning techniques and can represent any distribution over permutations. Their method significantly outperforms current approaches on the jigsaw puzzle benchmark and learns non-trivial distributions in new benchmarks like cyclic permutations and movie re-ranking, where traditional models fail.

Learning distributions over permutations is a fundamental problem in machine learning, with applications in ranking, combinatorial optimization, structured prediction, and data association. Existing methods rely on mixtures of parametric families or neural networks with expensive variational inference procedures. In this work, we propose a novel approach that leverages alternative representations for permutations, including Lehmer codes, Fisher-Yates draws, and Insertion-Vectors. These representations form a bijection with the symmetric group, allowing for unconstrained learning using conventional deep learning techniques, and can represent any probability distribution over permutations. Our approach enables a trade-off between expressivity of the model family and computational requirements. In the least expressive and most computationally efficient case, our method subsumes previous families of well established probabilistic models over permutations, including Mallow's and the Repeated Insertion Model. Experiments indicate our method significantly outperforms current approaches on the jigsaw puzzle benchmark, a common task for permutation learning. However, we argue this benchmark is limited in its ability to assess learning probability distributions, as the target is a delta distribution (i.e., a single correct solution exists). We therefore propose two additional benchmarks: learning cyclic permutations and re-ranking movies based on user preference. We show that our method learns non-trivial distributions even in the least expressive mode, while traditional models fail to even generate valid permutations in this setting.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes