ARAIFeb 6, 2024

ProactivePIM: Accelerating Weight-Sharing Embedding Layer with PIM for Scalable Recommendation System

arXiv:2402.04032v5h-index: 2IEEE Access
AI Analysis

This work addresses inference scalability for personalized recommendation systems, which is an incremental improvement over existing PIM methods.

The paper tackles the challenge of accelerating weight-sharing embedding layers in recommendation systems by proposing ProactivePIM, a processing-in-memory system that integrates a cache with prefetching and subtable mapping to eliminate communication overhead, achieving a 4.8x speedup over prior works.

The model size growth of personalized recommendation systems poses new challenges for inference. Weight-sharing algorithms have been proposed for size reduction, but they increase memory access. Recent advancements in processing-in-memory (PIM) enhanced the model throughput by exploiting memory parallelism, but such algorithms introduce massive CPU-PIM communication into prior PIM systems. We propose ProactivePIM, a PIM system for weight-sharing recommendation system acceleration. ProactivePIM integrates a cache within the PIM with a prefetching scheme to leverage a unique locality of the algorithm and eliminate communication overhead through a subtable mapping strategy. ProactivePIM achieves a 4.8x speedup compared to prior works.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes