DC SEOct 28, 2012

Chunks and Tasks: a programming model for parallelization of dynamic algorithms

arXiv:1210.7427v130 citations

Originality Incremental advance

AI Analysis

This addresses the problem of simplifying parallel programming for complex dynamic applications, such as those requiring dynamic distribution of work and data, making it easier for application programmers to achieve high performance on clusters of multicore machines.

The authors tackled the challenge of parallelizing dynamic algorithms by proposing Chunks and Tasks, a programming model that abstracts data and work into chunks and tasks, enabling efficient implementation without explicit communication calls. They demonstrated its performance with a C++ library for sparse blocked matrix-matrix multiplication, showing concrete results in terms of high performance and user-friendliness.

We propose Chunks and Tasks, a parallel programming model built on abstractions for both data and work. The application programmer specifies how data and work can be split into smaller pieces, chunks and tasks, respectively. The Chunks and Tasks library maps the chunks and tasks to physical resources. In this way we seek to combine user friendliness with high performance. An application programmer can express a parallel algorithm using a few simple building blocks, defining data and work objects and their relationships. No explicit communication calls are needed; the distribution of both work and data is handled by the Chunks and Tasks library. This makes efficient implementation of complex applications that require dynamic distribution of work and data easier. At the same time, Chunks and Tasks imposes restrictions on data access and task dependencies that facilitates the development of high performance parallel back ends. We discuss the fundamental abstractions underlying the programming model, as well as performance and fault resilience considerations. We also present a pilot C++ library implementation for clusters of multicore machines and demonstrate its performance for sparse blocked matrix-matrix multiplication.

View on arXiv PDF

Similar