AXLE: Coordinated Offloading with Asynchronous Back-Streaming in Computational Memory Systems
This work addresses performance bottlenecks in disaggregated memory systems for workloads with diverse data and computation characteristics, offering a practical solution for near-memory processing.
AXLE introduces Asynchronous Back-Streaming, a new offloading protocol for CXL-based Computational Memory systems that coordinates CXL.io and CXL.mem to reduce data movement costs. It achieves up to 50.14% reduction in end-to-end runtime, reduces CCM and host idle times by 14.53x and 3.93x on average, and up to 6x reduction in host core stall time.
CXL-based Computational Memory (CCM) enables near-memory processing within expanded remote memory, offering opportunities to address data movement costs in disaggregated memory systems and to accelerate overall performance. However, existing offloading mechanisms do not fully leverage the trade-offs of different offload models based on different CXL protocols. This work first examines these tradeoffs and their impact on end-to-end performance and system efficiency for workloads with diverse data and computation characteristics. We propose Asynchronous Back-Streaming, a new offloading protocol that coordinates CXL.io and CXL.mem to enable result back-streaming and asynchronous pipelining across CCM and host tasks. We further design AXLE, a system that realizes this protocol with lightweight host-CCM interaction. Overall, AXLE reduces end-to-end runtime by up to 50.14%, reduces CCM and host idle times by an average of 14.53x and 3.93x, respectively, and achieves up to 6x reduction in host core stall time.