DC PF PL SEApr 18, 2017

Benchmarking OpenCL, OpenACC, OpenMP, and CUDA: programming productivity, performance, and energy consumption

Suejb Memeti, Lu Li, Sabri Pllana, Joanna Kolodziej, Christoph Kessler

arXiv:1704.05316v14.398 citations

Originality Synthesis-oriented

AI Analysis

This study helps developers and researchers select appropriate parallel programming frameworks for heterogeneous computing systems, though it is incremental as it benchmarks existing methods without introducing new techniques.

The paper empirically compares OpenMP, OpenACC, OpenCL, and CUDA frameworks in terms of programming productivity, performance, and energy consumption on heterogeneous systems using SPEC and Rodinia benchmarks, finding that CUDA generally offers the best performance and energy efficiency but with lower productivity, while OpenACC provides a good balance.

Many modern parallel computing systems are heterogeneous at their node level. Such nodes may comprise general purpose CPUs and accelerators (such as, GPU, or Intel Xeon Phi) that provide high performance with suitable energy-consumption characteristics. However, exploiting the available performance of heterogeneous architectures may be challenging. There are various parallel programming frameworks (such as, OpenMP, OpenCL, OpenACC, CUDA) and selecting the one that is suitable for a target context is not straightforward. In this paper, we study empirically the characteristics of OpenMP, OpenACC, OpenCL, and CUDA with respect to programming productivity, performance, and energy. To evaluate the programming productivity we use our homegrown tool CodeStat, which enables us to determine the percentage of code lines that was required to parallelize the code using a specific framework. We use our tool x-MeterPU to evaluate the energy consumption and the performance. Experiments are conducted using the industry-standard SPEC benchmark suite and the Rodinia benchmark suite for accelerated computing on heterogeneous systems that combine Intel Xeon E5 Processors with a GPU accelerator or an Intel Xeon Phi co-processor.

View on arXiv PDF

Similar