AdAEM: An Adaptively and Automated Extensible Measurement of LLMs' Value Difference
This addresses the need for more informative and up-to-date value assessment in LLMs, which is crucial for researchers and developers concerned with misalignment and biases, though it is an incremental improvement over static benchmarks.
The authors tackled the problem of measuring value differences in large language models (LLMs) by introducing AdAEM, an adaptive framework that automatically generates test questions to reveal LLMs' inclinations, resulting in the creation of 12,310 questions and benchmarking 16 LLMs.
Assessing Large Language Models (LLMs)' underlying value differences enables comprehensive comparison of their misalignment, cultural adaptability, and biases. Nevertheless, current value measurement datasets face the informativeness challenge: with often outdated, contaminated, or generic test questions, they can only capture the shared value orientations among different LLMs, leading to saturated and thus uninformative results. To address this problem, we introduce AdAEM, a novel, self-extensible assessment framework for revealing LLMs' inclinations. Distinct from previous static benchmarks, AdAEM can automatically and adaptively generate and extend its test questions. This is achieved by probing the internal value boundaries of a diverse set of LLMs developed across cultures and time periods in an in-context optimization manner. The optimization process theoretically maximizes an information-theoretic objective to extract the latest or culturally controversial topics, providing more distinguishable and informative insights about models' value differences. In this way, AdAEM is able to co-evolve with the development of LLMs, consistently tracking their value dynamics. Using AdAEM, we generate 12,310 questions grounded in Schwartz Value Theory, conduct an extensive analysis to manifest our method's validity and effectiveness, and benchmark the values of 16 LLMs, laying the groundwork for better value research.