PMJun 10, 2018
Optimal Control of Constrained Stochastic Linear-Quadratic Model with ApplicationsWeiping Wu, Jianjun Gao, Junguo Lu et al.
This paper studies a class of continuous-time scalar-state stochastic Linear-Quadratic (LQ) optimal control problem with the linear control constraints. Applying the state separation theorem induced from its special structure, we develop the explicit solution for this class of problem. The revealed optimal control policy is a piece-wise affine function of system state. This control policy can be computed efficiently by solving two Riccati equations off-line. Under some mild conditions, the stationary optimal control policy can be also derived for this class of problem with infinite horizon. This result can be used to solve the constrained dynamic mean-variance portfolio selection problem. Examples shed light on the solution procedure of implementing our method.
ROSep 13, 2024
AnyBipe: An End-to-End Framework for Training and Deploying Bipedal Robots Guided by Large Language ModelsYifei Yao, Wentao He, Chenyu Gu et al.
Training and deploying reinforcement learning (RL) policies for robots, especially in accomplishing specific tasks, presents substantial challenges. Recent advancements have explored diverse reward function designs, training techniques, simulation-to-reality (sim-to-real) transfers, and performance analysis methodologies, yet these still require significant human intervention. This paper introduces an end-to-end framework for training and deploying RL policies, guided by Large Language Models (LLMs), and evaluates its effectiveness on bipedal robots. The framework consists of three interconnected modules: an LLM-guided reward function design module, an RL training module leveraging prior work, and a sim-to-real homomorphic evaluation module. This design significantly reduces the need for human input by utilizing only essential simulation and deployment platforms, with the option to incorporate human-engineered strategies and historical data. We detail the construction of these modules, their advantages over traditional approaches, and demonstrate the framework's capability to autonomously develop and refine controlling strategies for bipedal robot locomotion, showcasing its potential to operate independently of human intervention.