ValueCompass: A Framework for Measuring Contextual Value Alignment Between Human and LLMs
This addresses the critical issue of ensuring AI systems align with diverse human and societal values, which is increasingly important as AI advances, though it is incremental in providing a specific measurement framework.
The paper tackles the problem of measuring alignment between human values and large language models (LLMs) by introducing ValueCompass, a framework grounded in psychological theory, and finds concerning misalignments, such as humans endorsing 'National Security' while LLMs largely reject it, across four real-world scenarios.
As AI systems become more advanced, ensuring their alignment with a diverse range of individuals and societal values becomes increasingly critical. But how can we capture fundamental human values and assess the degree to which AI systems align with them? We introduce ValueCompass, a framework of fundamental values, grounded in psychological theory and a systematic review, to identify and evaluate human-AI alignment. We apply ValueCompass to measure the value alignment of humans and large language models (LLMs) across four real-world scenarios: collaborative writing, education, public sectors, and healthcare. Our findings reveal concerning misalignments between humans and LLMs, such as humans frequently endorse values like "National Security" which were largely rejected by LLMs. We also observe that values differ across scenarios, highlighting the need for context-aware AI alignment strategies. This work provides valuable insights into the design space of human-AI alignment, laying the foundations for developing AI systems that responsibly reflect societal values and ethics.