AIAug 31, 2025
Sharpe Ratio Optimization in Markov Decision ProcessesShuai Ma, Guangwu Liu, Li Xia
Sharpe ratio (also known as reward-to-variability ratio) is a widely-used metric in finance, which measures the additional return at the cost of per unit of increased risk (standard deviation of return). However, the optimization of Sharpe ratio in Markov decision processes (MDPs) is challenging, because there exist two difficulties hindering the application of dynamic programming. One is that dynamic programming does not work for fractional objectives, and the other is that dynamic programming is invalid for risk metrics. In this paper, we study the Sharpe ratio optimization in infinite-horizon MDPs, considering both the long-run average and discounted settings. We address the first challenge with the Dinkelbachs transform, which converts the Sharpe ratio objective to a mean-squared-variance (M2V) objective. It is shown that the M2V optimization and the original Sharpe ratio optimization share the same optimal policy when the risk-sensitive parameter is equal to the optimal Sharpe ratio. For the second challenge, we develop an iterative algorithm to solve the M2V optimization which is similar to a mean-variance optimization in MDPs. We iteratively solve the M2V problem and obtain the associated Sharpe ratio that is used to update the risk-sensitive parameter in the next iteration of M2V problems. We show that such a sequence of Sharpe ratios derived is monotonically increasing and converges to the optimal Sharpe ratio. For both average and discounted MDP settings, we develop a policy iteration procedure and prove its convergence to the optimum. Numerical experiments are conducted for validation. To the best of our knowledge, our approach is the first that solves the Sharpe ratio optimization in MDPs with dynamic programming type algorithms. We believe that the proposed algorithm can shed light on solving MDPs with other fractional objectives.
OCFeb 28, 2025
Enhanced Derivative-Free Optimization Using Adaptive Correlation-Induced Finite Difference EstimatorsGuo Liang, Guangwu Liu, Kun Zhang
Gradient-based methods are well-suited for derivative-free optimization (DFO), where finite-difference (FD) estimates are commonly used as gradient surrogates. Traditional stochastic approximation methods, such as Kiefer-Wolfowitz (KW) and simultaneous perturbation stochastic approximation (SPSA), typically utilize only two samples per iteration, resulting in imprecise gradient estimates and necessitating diminishing step sizes for convergence. In this paper, we first explore an efficient FD estimate, referred to as correlation-induced FD estimate, which is a batch-based estimate. Then, we propose an adaptive sampling strategy that dynamically determines the batch size at each iteration. By combining these two components, we develop an algorithm designed to enhance DFO in terms of both gradient estimation efficiency and sample efficiency. Furthermore, we establish the consistency of our proposed algorithm and demonstrate that, despite using a batch of samples per iteration, it achieves the same convergence rate as the KW and SPSA methods. Additionally, we propose a novel stochastic line search technique to adaptively tune the step size in practice. Finally, comprehensive numerical experiments confirm the superior empirical performance of the proposed algorithm.
MEMay 9, 2024
A Correlation-induced Finite Difference EstimatorGuo Liang, Guangwu Liu, Kun Zhang
Finite difference (FD) approximation is a classic approach to stochastic gradient estimation when only noisy function realizations are available. In this paper, we first provide a sample-driven method via the bootstrap technique to estimate the optimal perturbation, and then propose an efficient FD estimator based on correlated samples at the estimated optimal perturbation. Furthermore, theoretical analyses of both the perturbation estimator and the FD estimator reveal that, {\it surprisingly}, the correlation enables the proposed FD estimator to achieve a reduction in variance and, in some cases, a decrease in bias compared to the traditional optimal FD estimator. Numerical results confirm the efficiency of our estimators and align well with the theory presented, especially in scenarios with small sample sizes. Finally, we apply the estimator to solve derivative-free optimization (DFO) problems, and numerical studies show that DFO problems with 100 dimensions can be effectively solved.