A Virtual Processor brings back the Free Lunch

arXiv:2605.305072.0

Predicted impact top 58% in PF · last 90 daysOriginality Incremental advance

AI Analysis

This work addresses the problem of simplifying parallel programming for developers of numerical array programs by automating parallelization, which is a significant challenge for many scientific and engineering applications.

This paper introduces a self-optimizing virtual processor (VP) that automatically parallelizes numerical array programs at runtime, eliminating the need for manual parallelization by developers. The VP uses a decentralized network of cooperative execution segments that make local decisions about execution, leading to an emergent overall strategy.

This work introduces a self-optimizing virtual processor (VP) for numerical array programs that shifts parallelization from a manual developer task to a cooperative, agent-like runtime mechanism. Instead of relying on centralized task-graph scheduling, static compiler optimization, or explicitly annotated parallel constructs, the VP uses a decentralized network of cooperative execution segments, derived from the stream of numerical instructions and their data dependencies at runtime. Each segment makes only local decisions about when, where, and how to prepare and execute its computation, including task placement, kernel preparation, and data movement. No central scheduler or mapper instance determines the execution globally; instead, scheduling itself is parallelized and distributed over time - asynchronously and strictly dependency driven. The overall execution strategy emerges from concurrently executing local segments, continuously responding to data availability, cost estimates, system state, hardware capabilities, and problem size. While preserving the sequential semantics of the program our VP automatically exploits parallelism across large program regions rather than being limited to individual loop bodies, modules, or explicitly marked parallel sections; developers are not required to design or encode a parallelization strategy. The current VP primarily targets low-latency strong scaling on local heterogeneous hardware, covering workloads from small, latency-sensitive array operations to large data-parallel computations. The current implementation targets the predefined array instruction set of the ILNumerics.ONAL domain-specific language, while the underlying concept is applicable to general array-based numerical programming models such as MATLAB and NumPy.

View on arXiv PDF

Similar