Warp-STAR: High-performance, Differentiable GPU-Accelerated Static Timing Analysis through Warp-oriented Parallel Orchestration
This work addresses inefficiencies in GPU-based STA for EDA, providing significant speed improvements for circuit design optimization.
The paper tackled the computational bottleneck in Static Timing Analysis (STA) for Electronic Design Automation by introducing Warp-STAR, a GPU-accelerated engine that eliminates intra-warp load imbalance, achieving a 2.4X speedup over previous state-of-the-art GPU-based STA and a 1.7X speedup in timing-driven global placement.
Static timing analysis (STA) is crucial for Electronic Design Automation (EDA) flows but remains a computational bottleneck. While existing GPU-based STA engines are faster than CPU, they suffer from inefficiencies, particularly intra-warp load imbalance caused by irregular circuit graphs. This paper introduces Warp-STAR, a novel GPU-accelerated STA engine that eliminates this imbalance by orchestrating parallel computations at the warp level. This approach achieves a 2.4X speedup over previous state-of-the-art (SoTA) GPU-based STA. When integrated into a timing-driven global placement framework, Warp-STAR delivers a 1.7X speedup over SoTA frameworks. The method also proves effective for differentiable gradient analysis with minimal overhead.