AI LGJul 13, 2023

Leveraging Contextual Counterfactuals Toward Belief Calibration

Qiuyi, Zhang, Michael S. Lee, Sherol Chen

arXiv:2307.06513v15.42 citationsh-index: 11

Originality Incremental advance

AI Analysis

This addresses the meta-alignment problem for AI systems by improving belief calibration across populations and contexts, though it is incremental as it builds on existing alignment methods.

The paper tackles the problem of calibrating diverse human beliefs in AI alignment by introducing a framework that uses contextual counterfactuals and multi-objective optimization to adjust belief strengths across different contexts, demonstrating its efficacy on a toy credit decision dataset.

Beliefs and values are increasingly being incorporated into our AI systems through alignment processes, such as carefully curating data collection principles or regularizing the loss function used for training. However, the meta-alignment problem is that these human beliefs are diverse and not aligned across populations; furthermore, the implicit strength of each belief may not be well calibrated even among humans, especially when trying to generalize across contexts. Specifically, in high regret situations, we observe that contextual counterfactuals and recourse costs are particularly important in updating a decision maker's beliefs and the strengths to which such beliefs are held. Therefore, we argue that including counterfactuals is key to an accurate calibration of beliefs during alignment. To do this, we first segment belief diversity into two categories: subjectivity (across individuals within a population) and epistemic uncertainty (within an individual across different contexts). By leveraging our notion of epistemic uncertainty, we introduce `the belief calibration cycle' framework to more holistically calibrate this diversity of beliefs with context-driven counterfactual reasoning by using a multi-objective optimization. We empirically apply our framework for finding a Pareto frontier of clustered optimal belief strengths that generalize across different contexts, demonstrating its efficacy on a toy dataset for credit decisions.

View on arXiv PDF

Similar