LGAIMLMay 26, 2025

Minimalist Softmax Attention Provably Learns Constrained Boolean Functions

arXiv:2505.19531v19 citationsh-index: 14
Originality Incremental advance
AI Analysis

This work addresses computational limits in learning Boolean functions for AI theory, offering insights into minimalist architectures, but it is incremental as it builds on prior methods like Chain-of-Thought.

The paper tackled the problem of learning k-bit Boolean functions like AND and OR using a minimalist single-head softmax-attention mechanism, showing that these functions are unsolvable with attention alone but become solvable with teacher forcing, requiring only one gradient descent update instead of multi-step reasoning.

We study the computational limits of learning $k$-bit Boolean functions (specifically, $\mathrm{AND}$, $\mathrm{OR}$, and their noisy variants), using a minimalist single-head softmax-attention mechanism, where $k=Θ(d)$ relevant bits are selected from $d$ inputs. We show that these simple $\mathrm{AND}$ and $\mathrm{OR}$ functions are unsolvable with a single-head softmax-attention mechanism alone. However, with teacher forcing, the same minimalist attention is capable of solving them. These findings offer two key insights: Architecturally, solving these Boolean tasks requires only minimalist attention, without deep Transformer blocks or FFNs. Methodologically, one gradient descent update with supervision suffices and replaces the multi-step Chain-of-Thought (CoT) reasoning scheme of [Kim and Suzuki, ICLR 2025] for solving Boolean problems. Together, the bounds expose a fundamental gap between what this minimal architecture achieves under ideal supervision and what is provably impossible under standard training.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes