LGMLMay 17, 2025

Transformers as Unsupervised Learning Algorithms: A study on Gaussian Mixtures

arXiv:2505.11918v11 citationsh-index: 1Has Code
Originality Incremental advance
AI Analysis

This work addresses the gap in understanding transformers for unsupervised learning, offering a versatile tool for fundamental statistical estimation problems, though it is incremental in extending transformer applications.

The paper tackles the problem of using transformers for unsupervised learning by applying them to Gaussian Mixture Models, showing that the proposed TGMM framework outperforms classical methods like EM and spectral algorithms with empirical robustness to distribution shifts.

The transformer architecture has demonstrated remarkable capabilities in modern artificial intelligence, among which the capability of implicitly learning an internal model during inference time is widely believed to play a key role in the under standing of pre-trained large language models. However, most recent works have been focusing on studying supervised learning topics such as in-context learning, leaving the field of unsupervised learning largely unexplored. This paper investigates the capabilities of transformers in solving Gaussian Mixture Models (GMMs), a fundamental unsupervised learning problem through the lens of statistical estimation. We propose a transformer-based learning framework called TGMM that simultaneously learns to solve multiple GMM tasks using a shared transformer backbone. The learned models are empirically demonstrated to effectively mitigate the limitations of classical methods such as Expectation-Maximization (EM) or spectral algorithms, at the same time exhibit reasonable robustness to distribution shifts. Theoretically, we prove that transformers can approximate both the EM algorithm and a core component of spectral methods (cubic tensor power iterations). These results bridge the gap between practical success and theoretical understanding, positioning transformers as versatile tools for unsupervised learning.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes