AI CYApr 19, 2024

Mapping Social Choice Theory to RLHF

arXiv:2404.13038v120.933 citationsh-index: 7

Originality Synthesis-oriented

AI Analysis

This work addresses the challenge of aggregating human preferences in AI systems, but it is incremental as it primarily discusses theoretical connections without new empirical results.

The paper analyzes the relationship between social choice theory and reinforcement learning from human feedback (RLHF), identifying key differences in their problem settings and discussing how these affect the interpretation of social choice results in RLHF.

Recent work on the limitations of using reinforcement learning from human feedback (RLHF) to incorporate human preferences into model behavior often raises social choice theory as a reference point. Social choice theory's analysis of settings such as voting mechanisms provides technical infrastructure that can inform how to aggregate human preferences amid disagreement. We analyze the problem settings of social choice and RLHF, identify key differences between them, and discuss how these differences may affect the RLHF interpretation of well-known technical results in social choice.

View on arXiv PDF

Similar