GRPOformer: Advancing Hyperparameter Optimization via Group Relative Policy Optimization
This work addresses efficiency and performance limitations in hyperparameter optimization for machine learning practitioners, though it appears incremental by adapting existing RL techniques to HPO.
The paper tackled hyperparameter optimization by proposing GRPOformer, a framework integrating reinforcement learning with Transformers, which outperformed baseline methods on OpenML tasks.
Hyperparameter optimization (HPO) plays a critical role in improving model performance. Transformer-based HPO methods have shown great potential; however, existing approaches rely heavily on large-scale historical optimization trajectories and lack effective reinforcement learning (RL) techniques, thereby limiting their efficiency and performance improvements. Inspired by the success of Group Relative Policy Optimization (GRPO) in large language models (LLMs), we propose GRPOformer -- a novel hyperparameter optimization framework that integrates reinforcement learning (RL) with Transformers. In GRPOformer, Transformers are employed to generate new hyperparameter configurations from historical optimization trajectories, while GRPO enables rapid trajectory construction and optimization strategy learning from scratch. Moreover, we introduce Policy Churn Regularization (PCR) to enhance the stability of GRPO training. Experimental results on OpenML demonstrate that GRPOformer consistently outperforms baseline methods across diverse tasks, offering new insights into the application of RL for HPO.