Joshua T. S. Hewson

AI
h-index2
3papers
1citation
Novelty52%
AI Score25

3 Papers

AIJul 8, 2024
One system for learning and remembering episodes and rules

Joshua T. S. Hewson, Sabina J. Sloman, Marina Dubova

Humans can learn individual episodes and generalizable rules and also successfully retain both kinds of acquired knowledge over time. In the cognitive science literature, (1) learning individual episodes and rules and (2) learning and remembering are often both conceptualized as competing processes that necessitate separate, complementary learning systems. Inspired by recent research in statistical learning, we challenge these trade-offs, hypothesizing that they arise from capacity limitations rather than from the inherent incompatibility of the underlying cognitive processes. Using an associative learning task, we show that one system with excess representational capacity can learn and remember both episodes and rules.

AIOct 21, 2024
Combining Theory of Mind and Kindness for Self-Supervised Human-AI Alignment

Joshua T. S. Hewson

As artificial intelligence (AI) becomes deeply integrated into critical infrastructures and everyday life, ensuring its safe deployment is one of humanity's most urgent challenges. Current AI models prioritize task optimization over safety, leading to risks of unintended harm. These risks are difficult to address due to the competing interests of governments, businesses, and advocacy groups, all of which have different priorities in the AI race. Current alignment methods, such as reinforcement learning from human feedback (RLHF), focus on extrinsic behaviors without instilling a genuine understanding of human values. These models are vulnerable to manipulation and lack the social intelligence necessary to infer the mental states and intentions of others, raising concerns about their ability to safely and responsibly make important decisions in complex and novel situations. Furthermore, the divergence between extrinsic and intrinsic motivations in AI introduces the risk of deceptive or harmful behaviors, particularly as systems become more autonomous and intelligent. We propose a novel human-inspired approach which aims to address these various concerns and help align competing objectives.

AIOct 21, 2024
We Urgently Need Intrinsically Kind Machines

Joshua T. S. Hewson

Artificial Intelligence systems are rapidly evolving, integrating extrinsic and intrinsic motivations. While these frameworks offer benefits, they risk misalignment at the algorithmic level while appearing superficially aligned with human values. In this paper, we argue that an intrinsic motivation for kindness is crucial for making sure these models are intrinsically aligned with human values. We argue that kindness, defined as a form of altruism motivated to maximize the reward of others, can counteract any intrinsic motivations that might lead the model to prioritize itself over human well-being. Our approach introduces a framework and algorithm for embedding kindness into foundation models by simulating conversations. Limitations and future research directions for scalable implementation are discussed.