AIApr 30
Causal Foundations of Collective AgencyFrederik Hytting Jørgensen, Sebastian Weichwald, Lewis Hammond
A key challenge for the safety of advanced AI systems is the possibility that multiple simpler agents might inadvertently form a collective agent with capabilities and goals distinct from those of any individual. More generally, determining when a group of agents can be viewed as a unified collective agent is a foundational question in the study of interactions and incentives in both biological and artificial systems. We adopt a behavioral perspective in answering this question, ascribing collective agency to a group when viewing the group's joint actions as rational and goal-directed successfully predicts its behavior. We formalize this perspective on collective agency using causal games -- which are causal models of strategic, multi-agent interactions -- and causal abstraction -- which formalizes when a simple, high-level model faithfully captures a more complex, low-level model. We use this framework to solve a puzzle regarding multi-agent incentives in actor-critic models and to make quantitative assessments of the degree of collective agency exhibited by different voting mechanisms. Our framework aims to provide a foundation for theoretical and empirical work to understand, predict, and control emergent collective agents in multi-agent AI systems.
MLJun 1, 2023
Unfair Utilities and First Steps Towards Improving ThemFrederik Hytting Jørgensen, Sebastian Weichwald, Jonas Peters
Many fairness criteria constrain the policy or choice of predictors, which can have unwanted consequences, in particular, when optimizing the policy under such constraints. Here, we advocate to instead focus on the utility function the policy is optimizing for. We define value of information fairness and propose to not use utility functions that violate this criterion. This principle suggests to modify these utility functions such that they satisfy value of information fairness. We describe how this can be done and discuss consequences for the corresponding optimal policies. We apply our framework to thought experiments and the COMPAS data. Focussing on the utility function provides better answers than existing fairness notions: We are not aware of any intuitively fair policy that is disallowed by value of information fairness, and when we find that value of information fairness recommends an intuitively unfair policy, no existing fairness notion finds an intuitively fair policy.
MLJan 31, 2025
What is causal about causal models and representations?Frederik Hytting Jørgensen, Luigi Gresele, Sebastian Weichwald
Causal Bayesian networks are 'causal' models since they make predictions about interventional distributions. To connect such causal model predictions to real-world outcomes, we must determine which actions in the world correspond to which interventions in the model. For example, to interpret an action as an intervention on a treatment variable, the action will presumably have to a) change the distribution of treatment in a way that corresponds to the intervention, and b) not change other aspects, such as how the outcome depends on the treatment; while the marginal distributions of some variables may change as an effect. We introduce a formal framework to make such requirements for different interpretations of actions as interventions precise. We prove that the seemingly natural interpretation of actions as interventions is circular: Under this interpretation, every causal Bayesian network that correctly models the observational distribution is trivially also interventionally valid, and no action yields empirical data that could possibly falsify such a model. We prove an impossibility result: No interpretation exists that is non-circular and simultaneously satisfies a set of natural desiderata. Instead, we examine non-circular interpretations that may violate some desiderata and show how this may in turn enable the falsification of causal models. By rigorously examining how a causal Bayesian network could be a 'causal' model of the world instead of merely a mathematical object, our formal framework contributes to the conceptual foundations of causal representation learning, causal discovery, and causal abstraction, while also highlighting some limitations of existing approaches.