DCMay 4

VDCores: Resource Decoupled Programming and Execution for Asynchronous GPU

arXiv:2605.0319063.0Has Code
Predicted impact top 19% in DC · last 90 daysOriginality Highly original
AI Analysis

For GPU programmers, VDCores addresses the underutilization of specialized asynchronous hardware units by removing static orchestration and enabling automatic overlap of memory and compute.

VDCores introduces a decoupled programming and execution model for asynchronous GPUs, improving decoding throughput by 24% on average and up to 77% under dynamic inputs across LLM inference workloads, while reducing kernel programming effort by 90%.

Modern GPUs increasingly rely on specialized and asynchronous hardware units to deliver high performance. Yet these units are often underutilized because today's GPU software stacks still organize programming and execution around a monolithic kernel model that mismatches asynchronous hardware. To address this issue, Virtual Decoupled Engines (VDCores) presents a new decoupled programming and execution model for asynchronous GPUs. VDCores abstracts asynchronous hardware execution units as resource isolated virtual cores and represents workloads as dependency-connected micro-operations (micro-ops). this abstraction removes static orchestration from the programmer, enables automatic overlap of memory and compute based on dependency and resource readiness, and thereby improves utilization of asynchronous hardware resources. Realizing such a decoupled abstraction efficiently on today's GPUs is itself challenging, VDCores addresses this through a GPU-specialized programming model and GPU runtime design that preserves the flexibility while minimizing implementation overhead. Across four LLM inference workloads on GH200, H100, and RTX 6000 Pro GPUs, VDCores significantly improves decoding throughput by 24% on average and by up to 77% under dynamic inputs, while reducing kernel programming and specialization effort by 90%. We have open sourced VDCores at https://github.com/vdcores/vdcores.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes