DB AI IRAug 30, 2025

Access Paths for Efficient Ordering with Large Language Models

Fuheng Zhao, Jiayue Chen, Yiming Pan, Tahseen Rabbani, Divyakant Agrawal, Amr El Abbadi

arXiv:2509.00303v14.33 citationsh-index: 74

Originality Incremental advance

AI Analysis

This work addresses the challenge of ordering data with LLMs for data systems, offering incremental improvements in efficiency and accuracy.

The paper tackles the problem of efficiently ordering data using large language models (LLMs) by introducing the LLM ORDER BY operator and studying its physical implementations, with results showing that no single approach is universally optimal and that new designs like an agreement-based batch-size policy, majority voting mechanism, and two-way external merge sort achieve high accuracy-efficiency trade-offs across datasets and models.

We present the LLM ORDER BY operator as a logical abstraction and study its physical implementations within a unified evaluation framework. Our experiments show that no single approach is universally optimal, with effectiveness depending on query characteristics and data. We introduce three new designs: an agreement-based batch-size policy, a majority voting mechanism for pairwise sorting, and a two-way external merge sort adapted for LLMs. With extensive experiments, our agreement-based procedure is effective at determining batch size for value-based methods, the majority-voting mechanism consistently strengthens pairwise comparisons on GPT-4o, and external merge sort achieves high accuracy-efficiency trade-offs across datasets and models. We further observe a log-linear scaling between compute cost and ordering quality, offering the first step toward principled cost models for LLM powered data systems.

View on arXiv PDF

Similar