AR LG PLApr 14, 2025

Ember: A Compiler for Efficient Embedding Operations on Decoupled Access-Execute Architectures

Marco Siracusa, Olivia Hsu, Victor Soria-Pardos, Joshua Randall, Arnaud Grasset, Eric Biscondi, Doug Joseph, Randy Allen, Fredrik Kjolstad, Miquel Moretó Planas, Adrià Armejach

arXiv:2504.09870v11.2h-index: 13

Originality Incremental advance

AI Analysis

This addresses inefficiencies in embedding operations for large-scale AI models, offering a scalable solution to improve performance and energy efficiency, though it is incremental as it builds on existing DAE architectures.

The paper tackles the bottleneck of irregular embedding lookups in models like recommender systems and sparse large language models by proposing the Ember compiler, which automatically generates optimized code for Decoupled Access-Execute (DAE) processors, achieving 2.6x higher performance and 6.4x higher performance/watt compared to GPUs.

Irregular embedding lookups are a critical bottleneck in recommender models, sparse large language models, and graph learning models. In this paper, we first demonstrate that, by offloading these lookups to specialized access units, Decoupled Access-Execute (DAE) processors achieve 2.6$\times$ higher performance and 6.4$\times$ higher performance/watt than GPUs on end-to-end models. Then, we propose the Ember compiler for automatically generating optimized DAE code from PyTorch and TensorFlow. Conversely from other DAE compilers, Ember features multiple intermediate representations specifically designed for different optimization levels. In this way, Ember can implement all optimizations to match the performance of hand-written code, unlocking the full potential of DAE architectures at scale.

View on arXiv PDF

Similar