CAMformer: Associative Memory is All You Need
This addresses efficiency problems for AI hardware developers and users by offering a novel accelerator design for Transformers, though it is incremental as it builds on existing attention mechanisms with architectural improvements.
The paper tackles the scalability challenges of Transformers due to quadratic attention costs by proposing CAMformer, an accelerator that reinterprets attention as associative memory using analog charge sharing, achieving over 10x energy efficiency, up to 4x higher throughput, and 6-8x lower area while maintaining near-lossless accuracy on BERT and Vision Transformer workloads.
Transformers face scalability challenges due to the quadratic cost of attention, which involves dense similarity computations between queries and keys. We propose CAMformer, a novel accelerator that reinterprets attention as an associative memory operation and computes attention scores using a voltage-domain Binary Attention Content Addressable Memory (BA-CAM). This enables constant-time similarity search through analog charge sharing, replacing digital arithmetic with physical similarity sensing. CAMformer integrates hierarchical two-stage top-k filtering, pipelined execution, and high-precision contextualization to achieve both algorithmic accuracy and architectural efficiency. Evaluated on BERT and Vision Transformer workloads, CAMformer achieves over 10x energy efficiency, up to 4x higher throughput, and 6-8x lower area compared to state-of-the-art accelerators--while maintaining near-lossless accuracy.