Source-to-Source Automatic Differentiation of OpenMP Parallel Loops
This work addresses the bottleneck of gradient computation for parallelized applications in optimization and machine learning, though it is incremental as it extends existing automatic differentiation methods to OpenMP.
The paper tackles the problem of automatically differentiating OpenMP parallel loops to compute gradients efficiently, proposing a framework for correctness and implementing it in Tapenade, with performance tests showing better than sequential scaling in forward mode but worse in reverse mode.
This paper presents our work toward correct and efficient automatic differentiation of OpenMP parallel worksharing loops in forward and reverse mode. Automatic differentiation is a method to obtain gradients of numerical programs, which are crucial in optimization, uncertainty quantification, and machine learning. The computational cost to compute gradients is a common bottleneck in practice. For applications that are parallelized for multicore CPUs or GPUs using OpenMP, one also wishes to compute the gradients in parallel. We propose a framework to reason about the correctness of the generated derivative code, from which we justify our OpenMP extension to the differentiation model. We implement this model in the automatic differentiation tool Tapenade and present test cases that are differentiated following our extended differentiation procedure. Performance of the generated derivative programs in forward and reverse mode is better than sequential, although our reverse mode often scales worse than the input programs.