ARLGSep 4, 2025

Hardware-Aware Data and Instruction Mapping for AI Tasks: Balancing Parallelism, I/O and Memory Tradeoffs

arXiv:2509.03846v11 citationsh-index: 3
Originality Incremental advance
AI Analysis

This work addresses hardware efficiency for AI inference, offering incremental improvements in mapping techniques for programmable architectures.

The paper tackles the challenge of optimizing deep learning inference by introducing a mapping framework that balances parallelism, I/O, and memory tradeoffs, achieving high utilization (88-92%) and throughput scaling beyond 1 TFLOP/s with significant traffic reductions (up to 100 MB per layer).

We introduce a mapping framework for deep learning inference that takes advantage of predictable neural network behavior to plan both computation and communication ahead of time. The framework generates a unified stream of instructions and data, enabling the hardware to execute operations and route information on its own, without frequent involvement from the host and with minimal off-chip memory use. This naturally reduces reliance on I/O, off-chip memory, and host control. By leveraging fine-grained message passing on a programmable, message-based compute architecture, the framework keeps data movement local and coordinates computation across the array using techniques such as stationary-weight reuse, in-array multicasting, and staged reductions. Applied to VGG-19, the framework sustains high utilization (88 to 92 percent), with over 97 percent of messages generated internally and nearly 89 percent of time consumed on-chip transfers. Computation throughput scales beyond 1 TFLOP/s on larger arrays, while traffic reductions from reuse and local aggregation reach up to 100 MB per layer. Overall, the results highlight the effectiveness of streaming-based computation and show how our mapper enables this execution style by tightly coordinating data and instruction flow across the hardware.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes