LGMLAug 11, 2025

Scaled-Dot-Product Attention as One-Sided Entropic Optimal Transport

arXiv:2508.08369v15 citationsh-index: 2
Originality Incremental advance
AI Analysis

This provides a unified, principled mathematical foundation for SDPA, a core component in deep learning, which is incremental but clarifies its optimization and learning mechanisms.

The paper tackled the lack of first-principles justification for scaled-dot-product attention (SDPA) by showing it solves a degenerate, one-sided Entropic Optimal Transport problem, and proved that backpropagation gradients are mathematically identical to advantage-based policy gradients.

The scaled-dot-product attention (SDPA) mechanism is a core component of modern deep learning, but its mathematical form is often motivated by heuristics. This work provides a first-principles justification for SDPA. We first show that the attention forward pass is the exact solution to a degenerate, one-sided Entropic Optimal Transport (EOT) problem, which seeks a distribution that maximizes similarity while being maximally entropic. This optimization perspective has a direct consequence for the backward pass. We prove that the standard gradient computed via backpropagation is mathematically identical to an advantage-based policy gradient, a variance-reduced update rule from reinforcement learning. Crucially, we demonstrate that the EOT formulation of the forward pass induces a specific information geometry on the space of attention distributions. It is this geometry, characterized by the Fisher Information Matrix, that dictates the precise form of the learning gradient, revealing the advantage-based update as a natural consequence of the optimization problem being solved. This unified view reveals SDPA as a principled mechanism where the forward pass performs optimal inference and the backward pass implements a rational, manifold-aware learning update.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes