ARCVLGMLNov 16, 2023

Stella Nera: A Differentiable Maddness-Based Hardware Accelerator for Efficient Approximate Matrix Multiplication

arXiv:2311.10207v21 citationsh-index: 24
Originality Incremental advance
AI Analysis

This addresses the hardware bottleneck for AI applications by enabling more efficient matrix multiplications, though it is incremental as it builds on the existing Maddness method.

The paper tackles the computational and energy demands of matrix multiplications in AI by introducing Stella Nera, a Maddness-based hardware accelerator that achieves 161 TOp/s/W@0.55V, 25x better energy efficiency than conventional accelerators, and with differentiable fine-tuning reaches 92.5% Top-1 accuracy on CIFAR-10.

Artificial intelligence has surged in recent years, with advancements in machine learning rapidly impacting nearly every area of life. However, the growing complexity of these models has far outpaced advancements in available hardware accelerators, leading to significant computational and energy demands, primarily due to matrix multiplications, which dominate the compute workload. Maddness (i.e., Multiply-ADDitioN-lESS) presents a hash-based version of product quantization, which renders matrix multiplications into lookups and additions, eliminating the need for multipliers entirely. We present Stella Nera, the first Maddness-based accelerator achieving an energy efficiency of 161 TOp/s/W@0.55V, 25x better than conventional MatMul accelerators due to its small components and reduced computational complexity. We further enhance Maddness with a differentiable approximation, allowing for gradient-based fine-tuning and achieving an end-to-end performance of 92.5% Top-1 accuracy on CIFAR-10.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes