ARAug 4, 2023
Exploiting On-chip Heterogeneity of Versal Architecture for GNN Inference AccelerationPaul Chen, Pavan Manjunath, Sasindu Wijeratne et al.
Graph Neural Networks (GNNs) have revolutionized many Machine Learning (ML) applications, such as social network analysis, bioinformatics, etc. GNN inference can be accelerated by exploiting data sparsity in the input graph, vertex features, and intermediate data in GNN computations. For dynamic sparsity exploitation, we leverage the heterogeneous computing capabilities of AMD Versal ACAP architecture to accelerate GNN inference. We develop a custom hardware module that executes the sparse primitives of the computation kernel on the Programmable Logic (PL) and efficiently computes the dense primitives using the AI Engine (AIE). To exploit data sparsity during inference, we devise a runtime kernel mapping strategy that dynamically assigns computation tasks to the PL and AIE based on data sparsity. Our implementation on the VCK5000 ACAP platform leads to superior performance compared with the state-of-the-art implementations on CPU, GPU, ACAP, and other custom GNN accelerators. Compared with these implementations, we achieve significant average runtime speedup across various models and datasets of 162.42x, 17.01x, 9.90x, and 27.23x, respectively. Furthermore, for Graph Convolutional Network (GCN) inference, our approach leads to a speedup of 3.9-96.7x compared to designs using PL only on the same ACAP device.
ARMar 25
PowerFlow-DNN: Compiler-Directed Fine-Grained Power Orchestration for End-to-End Edge AI InferencePaul Chen, Jeongeun Kim, Wenbo Zhu et al.
Edge AI systems often operate under stringent energy and volume constraints that demand extreme efficiency under limited battery capacity, with requirements worsening as intelligent capability demands advance. Prior literature suggests that fine-grained power orchestration, including DVFS and power gating, enables significant energy efficiency benefits that cannot be left unexploited, while still exhibiting unexplored challenges. We observe that layer-level approaches incur unintended overheads due to inter-layer coupling of power control decisions, and that jointly managing these mechanisms under practical constraints such as limited voltage rails and transition overheads leads to a rapidly growing combinatorial schedule space. To address this, we propose PowerFlow-DNN, a compiler-directed framework for end-to-end power-state orchestration in ultra-low-power accelerators. By constructing a rigorous problem formulation for deadline-constrained, real-time, periodic inference as a unified inter-layer power-scheduling problem, our framework enables automated discovery of energy-minimal power-state schedules that adhere to a deadline while accounting for end-to-end, inter-layer impacts. We evaluate the framework on a DNN accelerator VLSI implementation in TSMC 40nm technology. Across representative edge networks, we show that PowerFlow-DNN discovers near-optimal solutions under the discretized formulation and achieves energy within 0.68\% of the exact ILP oracle, reducing energy by up to 37\% compared to an aggressive baseline without power orchestration, while reasoning over a combinatorial schedule space of over $10^{160}$ possible power-state assignments, yet operating on a structured layered state graph that enables efficient optimization, achieving up to 2.14$\times$ solver speedup via lightweight pruning.
AIMar 26, 2025
The Art of Tool Interface DesignYunnan Wu, Paul Chen, Deshank Baranwal et al.
We present an agentic framework, Thinker, which achieves state of art performance in challenging reasoning tasks for realistic customer service scenarios that involve complex business logic and human interactions via long horizons. On the $τ$-bench retail dataset, Thinker achieves 82.6\% success rate with GPT-4o (version 2024-06-01) (baseline: 68.3\%), and 81.9\% success rate with Llama-3.1 405B (baseline: 49.6\%), without any fine-tuning. Thinker effectively closes the gap in reasoning capabilities between the base models by introducing proper structure. The key features of the Thinker framework are: (1) State-Machine Augmented Generation (SMAG), which represents business logic as state machines and the LLM uses state machines as tools. (2) Delegation of tasks from the main reasoning loop to LLM-powered tools. (3) Adaptive context management. Our prompting-only solution achieves signficant gains, while still maintaining a standard agentic architecture with a ReAct style reasoning loop. The key is to innovate on the tool interface design, as exemplified by SMAG and the LLM-powered tools.