AIMar 25, 2024

CLHA: A Simple yet Effective Contrastive Learning Framework for Human Alignment

Feiteng Fang, Liang Zhu, Min Yang, Xi Feng, Jinchang Hou, Qixuan Zhao, Chengming Li, Xiping Hu, Ruifeng Xu

arXiv:2403.16649v233.883 citationsh-index: 16Has CodeLREC

Originality Incremental advance

AI Analysis

This work addresses the problem of simplifying and improving human alignment for large language models, which is crucial for ensuring AI safety and usability, though it appears incremental as it builds on existing RLHF techniques.

The paper tackles the complexity and training difficulty of reinforcement learning from human feedback (RLHF) for aligning large language models with human preferences by introducing CLHA, a contrastive learning framework that uses a novel rescoring strategy and adaptive losses, resulting in superior performance on the 'Helpful and Harmless' dataset in reward model scores, automatic evaluations, and human assessments.

Reinforcement learning from human feedback (RLHF) is a crucial technique in aligning large language models (LLMs) with human preferences, ensuring these LLMs behave in beneficial and comprehensible ways to users. However, a longstanding challenge in human alignment techniques based on reinforcement learning lies in their inherent complexity and difficulty in training. To address this challenge, we present a simple yet effective Contrastive Learning Framework for Human Alignment (CLHA) to align LLMs with human preferences directly. CLHA employs a novel rescoring strategy to evaluate the noise within the data by considering its inherent quality and dynamically adjusting the training process. Simultaneously, CLHA utilizes pairwise contrastive loss and adaptive supervised fine-tuning loss to adaptively modify the likelihood of generating responses, ensuring enhanced alignment with human preferences. Using advanced methods, CLHA surpasses other algorithms, showcasing superior performance in terms of reward model scores, automatic evaluations, and human assessments on the widely used ``Helpful and Harmless'' dataset.

View on arXiv PDF Code

Similar