CVAIApr 7

PoM: A Linear-Time Replacement for Attention with the Polynomial Mixer

arXiv:2604.0612987.3Has Code
AI Analysis

This addresses the scalability problem for researchers and practitioners using transformers in domains like text generation and image processing, offering a drop-in replacement with proven theoretical guarantees.

The paper tackles the computational inefficiency of self-attention in transformers by introducing the Polynomial Mixer (PoM), a linear-time token mixing mechanism that matches attention-based model performance across five domains while drastically reducing computational costs for long sequences.

This paper introduces the Polynomial Mixer (PoM), a novel token mixing mechanism with linear complexity that serves as a drop-in replacement for self-attention. PoM aggregates input tokens into a compact representation through a learned polynomial function, from which each token retrieves contextual information. We prove that PoM satisfies the contextual mapping property, ensuring that transformers equipped with PoM remain universal sequence-to-sequence approximators. We replace standard self-attention with PoM across five diverse domains: text generation, handwritten text recognition, image generation, 3D modeling, and Earth observation. PoM matches the performance of attention-based models while drastically reducing computational cost when working with long sequences. The code is available at https://github.com/davidpicard/pom.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes