CEAIAug 6, 2025

Compressing Large Language Models with PCA Without Performance Loss

arXiv:2508.04307v11 citations
Originality Incremental advance
AI Analysis

This addresses the problem of reducing computational costs for deploying large models, but it is incremental as it builds on existing PCA techniques.

The paper tackles model compression by applying PCA to inputs, achieving high performance with fewer parameters: a one-layer classifier on compressed MNIST reaches over 98% accuracy with 840 parameters, and a transformer on compressed embeddings matches GPT-2-like quality with under 17% of parameters.

We demonstrate that Principal Component Analysis (PCA), when applied in a structured manner, either to polar-transformed images or segment-wise to token sequences, enables extreme compression of neural models without sacrificing performance. Across three case studies, we show that a one-layer classifier trained on PCA-compressed polar MNIST achieves over 98 percent accuracy using only 840 parameters. A two-layer transformer trained on 70-dimensional PCA-reduced MiniLM embeddings reaches 76.62 percent accuracy on the 20 Newsgroups dataset with just 81000 parameters. A decoder-only transformer generates coherent token sequences from 70-dimensional PCA embeddings while preserving over 97 percent cosine similarity with full MiniLM representations, using less than 17 percent of the parameter count of GPT-2. These results highlight PCA-based input compression as a general and effective strategy for aligning model capacity with information content, enabling lightweight architectures across multiple modalities.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes