Thomas Sterling

DCFeb 8, 2024

Rhizomes and Diffusions for Processing Highly Skewed Graphs on Fine-Grain Message-Driven Systems

Bibrak Qamar Chandio, Prateek Srivastava, Maciej Brodowicz et al.

The paper provides a unified co-design of 1) a programming and execution model that allows spawning tasks from within the vertex data at runtime, 2) language constructs for \textit{actions} that send work to where the data resides, combining parallel expressiveness of local control objects (LCOs) to implement asynchronous graph processing primitives, 3) and an innovative vertex-centric data-structure, using the concept of Rhizomes, that parallelizes both the out and in-degree load of vertex objects across many cores and yet provides a single programming abstraction to the vertex objects. The data structure hierarchically parallelizes the out-degree load of vertices and the in-degree load laterally. The rhizomes internally communicate and remain consistent, using event-driven synchronization mechanisms, to provide a unified and correct view of the vertex. Simulated experimental results show performance gains for BFS, SSSP, and Page Rank on large chip sizes for the tested input graph datasets containing highly skewed degree distribution. The improvements come from the ability to express and create fine-grain dynamic computing task in the form of \textit{actions}, language constructs that aid the compiler to generate code that the runtime system uses to optimally schedule tasks, and the data structure that shares both in and out-degree compute workload among memory-processing elements.

DCJul 1, 2019

Fully-Asynchronous Fully-Implicit Variable-Order Variable-Timestep Simulation of Neural Networks

Bruno Magalhães, Michael Hines, Thomas Sterling et al.

State-of-the-art simulations of detailed neural models follow the Bulk Synchronous Parallel execution model. Execution is divided in equidistant communication intervals, equivalent to the shortest synaptic delay in the network. Neurons stepping is performed independently, with collective communication guiding synchronization and exchange of synaptic events. The interpolation step size is fixed and chosen based on some prior knowledge of the fastest possible dynamics in the system. However, simulations driven by stiff dynamics or a wide range of time scales - such as multiscale simulations of neural networks - struggle with fixed step interpolation methods, yielding excessive computation of intervals of quasi-constant activity, inaccurate interpolation of periods of high volatility solution, and being incapable of handling unknown or distinct time constants. A common alternative is the usage of adaptive stepping methods, however they have been deemed inefficient in parallel executions due to computational load imbalance at the synchronization barriers that characterize the BSP execution model. We introduce a distributed fully-asynchronous execution model that removes global synchronization, allowing for longer variable timestep interpolations. Asynchronicity is provided by active point-to-point communication notifying neurons' time advancement to synaptic connectivities. Time stepping is driven by scheduled neuron advancements based on synaptic delays across neurons, yielding an "exhaustive yet not speculative" adaptive-step execution. Execution benchmarks on 64 Cray XE6 compute nodes demonstrate a reduced number of interpolation steps, higher numerical accuracy and lower time to solution, compared to state-of-the-art methods. Efficiency is shown to be activity-dependent, with scaling of the algorithm demonstrated on a simulation of a laboratory experiment.

Thomas Sterling

2 Papers