OC CV DC NAAug 13, 2018

CLAIRE: A distributed-memory solver for constrained large deformation diffeomorphic image registration

Andreas Mang, Amir Gholami, Christos Davatzikos, George Biros

arXiv:1808.04487v240 citations

AI Analysis

This work provides a faster solver for medical image registration, which is incremental as it builds on existing methods with improved scalability and performance.

The authors tackled the problem of large deformation diffeomorphic image registration in three dimensions by developing CLAIRE, a distributed-memory solver that achieves a speedup of 5x on average and up to 17x peak compared to previous work, solving clinically relevant data sizes in two to four minutes on a standard compute node.

With this work, we release CLAIRE, a distributed-memory implementation of an effective solver for constrained large deformation diffeomorphic image registration problems in three dimensions. We consider an optimal control formulation. We invert for a stationary velocity field that parameterizes the deformation map. Our solver is based on a globalized, preconditioned, inexact reduced space Gauss--Newton--Krylov scheme. We exploit state-of-the-art techniques in scientific computing to develop an effective solver that scales to thousands of distributed memory nodes on high-end clusters. We present the formulation, discuss algorithmic features, describe the software package, and introduce an improved preconditioner for the reduced space Hessian to speed up the convergence of our solver. We test registration performance on synthetic and real data. We demonstrate registration accuracy on several neuroimaging datasets. We compare the performance of our scheme against different flavors of the Demons algorithm for diffeomorphic image registration. We study convergence of our preconditioner and our overall algorithm. We report scalability results on state-of-the-art supercomputing platforms. We demonstrate that we can solve registration problems for clinically relevant data sizes in two to four minutes on a standard compute node with 20 cores, attaining excellent data fidelity. With the present work we achieve a speedup of (on average) 5$\times$ with a peak performance of up to 17$\times$ compared to our former work.

View on arXiv PDF

Similar