ARAILGOct 5, 2021

RASA: Efficient Register-Aware Systolic Array Matrix Engine for CPU

arXiv:2110.01752v125 citations
Originality Incremental advance
AI Analysis

This addresses efficiency issues for CPU vendors incorporating matrix engines to boost AI application performance, representing an incremental improvement in architectural design.

The paper tackles the under-utilization and stalls in CPU-integrated systolic array matrix engines due to limited register storage, proposing RASA with techniques like sub-stage division and instruction overlapping to hide overheads, resulting in significant performance improvements with negligible area and power overhead.

As AI-based applications become pervasive, CPU vendors are starting to incorporate matrix engines within the datapath to boost efficiency. Systolic arrays have been the premier architectural choice as matrix engines in offload accelerators. However, we demonstrate that incorporating them inside CPUs can introduce under-utilization and stalls due to limited register storage to amortize the fill and drain times of the array. To address this, we propose RASA, Register-Aware Systolic Array. We develop techniques to divide an execution stage into several sub-stages and overlap instructions to hide overheads and run them concurrently. RASA-based designs improve performance significantly with negligible area and power overhead.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes