Higher-Order Action Regularization in Deep Reinforcement Learning: From Continuous Control to Building Energy Management
This addresses operational constraints like energy consumption and mechanical wear in real-world applications such as building energy management, representing an incremental improvement with practical validation.
The paper tackled the problem of erratic control behaviors in deep reinforcement learning agents by investigating higher-order action regularization, demonstrating that third-order derivative penalties achieve superior smoothness while maintaining competitive performance across four continuous control environments and reducing equipment switching by 60% in HVAC systems.
Deep reinforcement learning agents often exhibit erratic, high-frequency control behaviors that hinder real-world deployment due to excessive energy consumption and mechanical wear. We systematically investigate action smoothness regularization through higher-order derivative penalties, progressing from theoretical understanding in continuous control benchmarks to practical validation in building energy management. Our comprehensive evaluation across four continuous control environments demonstrates that third-order derivative penalties (jerk minimization) consistently achieve superior smoothness while maintaining competitive performance. We extend these findings to HVAC control systems where smooth policies reduce equipment switching by 60%, translating to significant operational benefits. Our work establishes higher-order action regularization as an effective bridge between RL optimization and operational constraints in energy-critical applications.