Loop Control Management in Tightly Coupled Processor Arrays (TCPAs)
This work addresses control inefficiencies for parallel loop accelerators in high-performance computing, offering a domain-specific improvement.
The paper tackles the problem of high control overhead in multidimensional loop kernels on Tightly Coupled Processor Arrays (TCPAs) by proposing a method to reduce control signals from a polyhedral representation, achieving reductions of 15x to 45x across benchmarks, and introducing a lightweight global controller architecture that uses less than 10% of array resources.
Multidimensional loop kernels often suffer from control overhead that can dominate execution time on parallel loop accelerators. Tightly Coupled Processor Arrays (TCPAs) offload loop control to a global controller (GC), but existing approaches still require hundreds of control signals. We propose a method to derive and aggressively reduce these control conditions from a polyhedral representation of the iteration space, achieving reductions of 15x to 45x in control signals across several benchmarks. We introduce a lightweight GC architecture that evaluates conditions as unions of polyhedra using bounded evaluation units, requiring hardware comparable to a single processing element. Control signals are distributed throughout the array with a minimal number of delay elements resulting in zero-overhead loop control. Our evaluation on PolyBench kernels shows that the entire control flow requires < 10 % of the total array resources.