NE AI AR LGFeb 12, 2025

Neuromorphic Principles for Efficient Large Language Models on Intel Loihi 2

Steven Abreu, Sumit Bam Shrestha, Rui-Jie Zhu, Jason Eshraghian

arXiv:2503.18002v211 citationsh-index: 17

Originality Incremental advance

AI Analysis

This addresses the problem of energy efficiency for LLM deployment, particularly in edge computing, though it is incremental as it adapts existing neuromorphic principles to a specific hardware platform.

The paper tackles the high energy consumption of large language models by developing a MatMul-free architecture adapted for Intel's neuromorphic processor Loihi 2, achieving up to 3x higher throughput and 2x less energy compared to transformer-based LLMs on an edge GPU.

Large language models (LLMs) deliver impressive performance but require large amounts of energy. In this work, we present a MatMul-free LLM architecture adapted for Intel's neuromorphic processor, Loihi 2. Our approach leverages Loihi 2's support for low-precision, event-driven computation and stateful processing. Our hardware-aware quantized model on GPU demonstrates that a 370M parameter MatMul-free model can be quantized with no accuracy loss. Based on preliminary results, we report up to 3x higher throughput with 2x less energy, compared to transformer-based LLMs on an edge GPU, with significantly better scaling. Further hardware optimizations will increase throughput and decrease energy consumption. These results show the potential of neuromorphic hardware for efficient inference and pave the way for efficient reasoning models capable of generating complex, long-form text rapidly and cost-effectively.

View on arXiv PDF

Similar