OC LGDec 22, 2025

Explicit and Non-asymptotic Query Complexities of Rank-Based Zeroth-order Algorithm on Stochastic Smooth Functions

arXiv:2512.19104v14.1h-index: 1

Originality Incremental advance

AI Analysis

This work addresses a fundamental gap in theoretical understanding for rank-based optimization, which is crucial for applications like reinforcement learning from human feedback and preference learning, though it is incremental in bridging theory and practice.

The paper tackles the problem of zeroth-order optimization with only ordinal feedback under stochastic objectives, establishing explicit non-asymptotic query complexity bounds that match the best-known results for value-based algorithms.

Zeroth-order (ZO) optimization with ordinal feedback has emerged as a fundamental problem in modern machine learning systems, particularly in human-in-the-loop settings such as reinforcement learning from human feedback, preference learning, and evolutionary strategies. While rank-based ZO algorithms enjoy strong empirical success and robustness properties, their theoretical understanding, especially under stochastic objectives and standard smoothness assumptions, remains limited. In this paper, we study rank-based zeroth-order optimization for stochastic functions where only ordinal feedback of the stochastic function is available. We propose a simple and computationally efficient rank-based ZO algorithm. Under standard assumptions including smoothness, strong convexity, and bounded second moments of stochastic gradients, we establish explicit non-asymptotic query complexity bounds for both convex and nonconvex objectives. Notably, our results match the best-known query complexities of value-based ZO algorithms, demonstrating that ordinal information alone is sufficient for optimal query efficiency in stochastic settings. Our analysis departs from existing drift-based and information-geometric techniques, offering new tools for the study of rank-based optimization under noise. These findings narrow the gap between theory and practice and provide a principled foundation for optimization driven by human preferences.

View on arXiv PDF

Similar