DCCVPFDec 17, 2013

Performance Engineering for a Medical Imaging Application on the Intel Xeon Phi Accelerator

arXiv:1401.3615v116 citations
Originality Synthesis-oriented
AI Analysis

This work addresses performance bottlenecks in medical imaging for healthcare and research, but it is incremental as it adapts existing methods to a new hardware platform.

The researchers tackled the challenge of efficiently running the FDK algorithm for 3D cone-beam CT image reconstruction on the Intel Xeon Phi accelerator, achieving optimized performance through parallelization, SIMD vectorization, and a refined performance model.

We examine the Xeon Phi, which is based on Intel's Many Integrated Cores architecture, for its suitability to run the FDK algorithm--the most commonly used algorithm to perform the 3D image reconstruction in cone-beam computed tomography. We study the challenges of efficiently parallelizing the application and means to enable sensible data sharing between threads despite the lack of a shared last level cache. Apart from parallelization, SIMD vectorization is critical for good performance on the Xeon Phi; we perform various micro-benchmarks to investigate the platform's new set of vector instructions and put a special emphasis on the newly introduced vector gather capability. We refine a previous performance model for the application and adapt it for the Xeon Phi to validate the performance of our optimized hand-written assembly implementation, as well as the performance of several different auto-vectorization approaches.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes