Impact of Markov Decision Process Design on Sim-to-Real Reinforcement Learning
This provides practical guidelines for deploying RL in industrial process control, but it is incremental as it focuses on specific design choices rather than a breakthrough.
The paper tackles the sim-to-real gap in reinforcement learning for industrial process control by analyzing how Markov Decision Process design choices affect transfer, finding that physics-based dynamics models achieve up to 50% real-world success in a color mixing task where simplified models fail.
Reinforcement Learning (RL) has demonstrated strong potential for industrial process control, yet policies trained in simulation often suffer from a significant sim-to-real gap when deployed on physical hardware. This work systematically analyzes how core Markov Decision Process (MDP) design choices -- state composition, target inclusion, reward formulation, termination criteria, and environment dynamics models -- affect this transfer. Using a color mixing task, we evaluate different MDP configurations and mixing dynamics across simulation and real-world experiments. We validate our findings on physical hardware, demonstrating that physics-based dynamics models achieve up to 50% real-world success under strict precision constraints where simplified models fail entirely. Our results provide practical MDP design guidelines for deploying RL in industrial process control.