CL AI LGMar 7, 2024

Aligners: Decoupling LLMs and Alignment

Lilian Ngweta, Mayank Agarwal, Subha Maity, Alex Gittens, Yuekai Sun, Mikhail Yurochkin

arXiv:2403.04224v413.826 citationsh-index: 22Has CodeTiny Papers @ ICLR

Originality Incremental advance

AI Analysis

This addresses the costly and repetitive alignment problem for LLM developers and users, though it is incremental as it builds on existing alignment methods.

The paper tackles the challenge of aligning large language models (LLMs) with human expectations by proposing aligner models that decouple alignment from LLMs, enabling alignment of any LLM for various criteria using synthetic data, resulting in consistent improvements across multiple datasets.

Large Language Models (LLMs) need to be aligned with human expectations to ensure their safety and utility in most applications. Alignment is challenging, costly, and needs to be repeated for every LLM and alignment criterion. We propose to decouple LLMs and alignment by training aligner models that can be used to align any LLM for a given criteria on an as-needed basis, thus also reducing the potential negative impacts of alignment on performance. Our recipe for training the aligner models solely relies on synthetic data generated with a (prompted) LLM and can be easily adjusted for a variety of alignment criteria. We use the same synthetic data to train inspectors, binary miss-alignment classification models to guide a "squad" of multiple aligners. Our empirical results demonstrate consistent improvements when applying aligner squad to various LLMs, including chat-aligned models, across several instruction-following and red-teaming datasets.

View on arXiv PDF Code

Similar