AILONov 27, 2025

Beyond the Black Box: A Cognitive Architecture for Explainable and Aligned AI

arXiv:2512.03072v2
Originality Incremental advance
AI Analysis

This addresses the problem of building explainable and aligned AI for researchers and practitioners aiming to develop trustworthy AGI, presenting a novel paradigm rather than an incremental improvement.

The paper tackles the challenges of explainability and value alignment in AI by introducing the Weight-Calculatism cognitive architecture, which achieves transparent, human-like reasoning and robust learning in novel scenarios, establishing a foundation for trustworthy AGI.

Current AI paradigms, as "architects of experience," face fundamental challenges in explainability and value alignment. This paper introduces "Weight-Calculatism," a novel cognitive architecture grounded in first principles, and demonstrates its potential as a viable pathway toward Artificial General Intelligence (AGI). The architecture deconstructs cognition into indivisible Logical Atoms and two fundamental operations: Pointing and Comparison. Decision-making is formalized through an interpretable Weight-Calculation model (Weight = Benefit * Probability), where all values are traceable to an auditable set of Initial Weights. This atomic decomposition enables radical explainability, intrinsic generality for novel situations, and traceable value alignment. We detail its implementation via a graph-algorithm-based computational engine and a global workspace workflow, supported by a preliminary code implementation and scenario validation. Results indicate that the architecture achieves transparent, human-like reasoning and robust learning in unprecedented scenarios, establishing a practical and theoretical foundation for building trustworthy and aligned AGI.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes