The Elephant in the Room -- Why AI Safety Demands Diverse Teams
This addresses AI safety for researchers and practitioners by offering an incremental shift in perspective rather than a technical breakthrough.
The paper tackles the problem of AI safety and alignment by proposing to treat it as a social science issue, suggesting that using social science tools and diverse teams can lead to better approaches for aligning AI with human goals.
We consider that existing approaches to AI "safety" and "alignment" may not be using the most effective tools, teams, or approaches. We suggest that an alternative and better approach to the problem may be to treat alignment as a social science problem, since the social sciences enjoy a rich toolkit of models for understanding and aligning motivation and behavior, much of which could be repurposed to problems involving AI models, and enumerate reasons why this is so. We introduce an alternate alignment approach informed by social science tools and characterized by three steps: 1. defining a positive desired social outcome for human/AI collaboration as the goal or "North Star," 2. properly framing knowns and unknowns, and 3. forming diverse teams to investigate, observe, and navigate emerging challenges in alignment.