Enhancing RLHF with Human Gaze Modeling
This work addresses efficiency issues in RLHF for language model alignment, offering incremental improvements in computational cost reduction.
The paper tackled the problem of high computational cost in Reinforcement Learning from Human Feedback (RLHF) by leveraging human gaze modeling to enhance reward models and distribute sparse rewards, resulting in faster convergence while maintaining or slightly improving performance, thus reducing computational costs.
Reinforcement Learning from Human Feedback (RLHF) aligns language models with human preferences but is computationally expensive. We explore two approaches that leverage human gaze modeling to enhance RLHF: (1) gaze-aware reward models and (2) gaze-based distribution of sparse rewards at token level. Our experiments demonstate that gaze-informed RLHF achieves faster convergence while maintaining or slightly improving performance, thus, reducing computational costs during policy optimization. These results show that human gaze provides a valuable and underused signal for policy optimization, pointing to a promising direction for improving RLHF efficiency.