LGMLOct 21, 2025

Optimality and NP-Hardness of Transformers in Learning Markovian Dynamical Functions

arXiv:2510.18638v2h-index: 3
Originality Incremental advance
AI Analysis

This work addresses a theoretical gap in transformer capabilities for dynamics-driven tasks, though it is incremental as it builds on prior linear regression studies.

The paper tackles the problem of understanding how transformers perform in-context learning for Markovian dynamical functions, revealing that finding optimal parameters for a single-layer linear self-attention model is NP-hard, which limits its ability to represent structured dynamics.

Transformer architectures can solve unseen tasks based on input-output pairs in a given prompt due to in-context learning (ICL). Existing theoretical studies on ICL have mainly focused on linear regression tasks, often with i.i.d. inputs. To understand how transformers express ICL when modeling dynamics-driven functions, we investigate Markovian function learning through a structured ICL setup, where we characterize the loss landscape to reveal underlying optimization behaviors. Specifically, we (1) provide the closed-form expression of the global minimizer (in an enlarged parameter space) for a single-layer linear self-attention (LSA) model; (2) prove that recovering transformer parameters that realize the optimal solution is NP-hard in general, revealing a fundamental limitation of one-layer LSA in representing structured dynamical functions; and (3) supply a novel interpretation of a multilayer LSA as performing preconditioned gradient descent to optimize multiple objectives beyond the square loss. These theoretical results are numerically validated using simplified transformers.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes