Value Alignment from Unstructured Text
This addresses the challenge of value alignment in AI for researchers and practitioners by reducing reliance on costly annotated data, though it appears incremental as it builds on existing synthetic data techniques.
The paper tackles the problem of aligning large language models to value systems without relying on expensive supervised data, by introducing a methodology that uses scalable synthetic data generation from unstructured text, and demonstrates improved performance on the Mistral-7B-Instruct model with automatic metrics and win rates.
Aligning large language models (LLMs) to value systems has emerged as a significant area of research within the fields of AI and NLP. Currently, this alignment process relies on the availability of high-quality supervised and preference data, which can be both time-consuming and expensive to curate or annotate. In this paper, we introduce a systematic end-to-end methodology for aligning LLMs to the implicit and explicit values represented in unstructured text data. Our proposed approach leverages the use of scalable synthetic data generation techniques to effectively align the model to the values present in the unstructured data. Through two distinct use-cases, we demonstrate the efficiency of our methodology on the Mistral-7B-Instruct model. Our approach credibly aligns LLMs to the values embedded within documents, and shows improved performance against other approaches, as quantified through the use of automatic metrics and win rates.