Prototypical Reward Network for Data-Efficient RLHF
This work addresses scalability issues in RLHF for fine-tuning Large Language Models, offering a more data-efficient approach for optimizing language models under restricted feedback conditions, though it appears incremental as it builds on existing RLHF methods.
The paper tackles the problem of resource-intensive human feedback collection for Reinforcement Learning from Human Feedback (RLHF) by proposing Proto-RM, a framework that uses prototypical networks to enhance reward models with limited data. The result shows that Proto-RM significantly improves performance in human feedback tasks while requiring significantly less data, achieving comparable or better results than traditional methods.
The reward model for Reinforcement Learning from Human Feedback (RLHF) has proven effective in fine-tuning Large Language Models (LLMs). Notably, collecting human feedback for RLHF can be resource-intensive and lead to scalability issues for LLMs and complex tasks. Our proposed framework Proto-RM leverages prototypical networks to enhance reward models under limited human feedback. By enabling stable and reliable structural learning from fewer samples, Proto-RM significantly enhances LLMs' adaptability and accuracy in interpreting human preferences. Extensive experiments on various datasets demonstrate that Proto-RM significantly improves the performance of reward models and LLMs in human feedback tasks, achieving comparable and usually better results than traditional methods, while requiring significantly less data. in data-limited scenarios. This research offers a promising direction for enhancing the efficiency of reward models and optimizing the fine-tuning of language models under restricted feedback conditions.