Matching Markets Meet LLMs: Algorithmic Reasoning with Ranked Preferences
This work addresses the underexplored ability of LLMs to handle algorithmic reasoning with ranked preferences, which is incremental as it systematically exposes limitations without proposing new methods.
The study tackled the problem of evaluating Large Language Models (LLMs) on reasoning tasks involving ranked preferences in matching markets, finding that even top-performing models struggle with instability in large markets, with fine-tuning improving performance only in small instances.
The rise of Large Language Models (LLMs) has driven progress in reasoning tasks -- from program synthesis to scientific hypothesis generation -- yet their ability to handle ranked preferences and structured algorithms in combinatorial domains remains underexplored. We study matching markets, a core framework behind applications like resource allocation and ride-sharing, which require reconciling individual ranked preferences to ensure stable outcomes. We evaluate several state-of-the-art models on a hierarchy of preference-based reasoning tasks -- ranging from stable-matching generation to instability detection, instability resolution, and fine-grained preference queries -- to systematically expose their logical and algorithmic limitations in handling ranked inputs. Surprisingly, even top-performing models with advanced reasoning struggle to resolve instability in large markets, often failing to identify blocking pairs or execute algorithms iteratively. We further show that parameter-efficient fine-tuning (LoRA) significantly improves performance in small markets, but fails to bring about a similar improvement on large instances, suggesting the need for more sophisticated strategies to improve LLMs' reasoning with larger-context inputs.