Mohammad Zubair

2papers

2 Papers

DCJan 22Code
ZEUS: An Efficient GPU Optimization Method Integrating PSO, BFGS, and Automatic Differentiation

Dominik Soos, Marc Paterno, Desh Ranjan et al.

We introduce a novel, efficient computational method, ZEUS, for numerical optimization, and provide an open-source implementation. It has four key ingredients: (1) particle swarm optimization (PSO), (2) the use of the Broyden-Fletcher-Goldfarb-Shanno (BFGS) method, (3) automatic differentiation (AD), and (4) GPUs. Our approach addresses the computational challenges inherent in high-dimensional, non-convex optimization problems. In the first phase of the algorithm, we get a potentially good set of starting points using PSO. Thereafter, we run BFGS independently in parallel from these starting points. BFGS is one of the best-performing algorithms for numerical optimization. However, it requires the gradient of the function being optimized. ZEUS integrates automatic differentiation into BFGS thus avoiding the need for the user to calculate derivatives explicitly. The use of GPUs allows ZEUS to speed up the calculations substantially. We carry out systematic studies to explore the trade-offs between the number of PSO iterations taken, starting points, and BFGS iteration depth. We show that a handful of iterations of PSO can improve global convergence when combined with BFGS. We also present performance studies using common test functions. The source code can be found at https://github.com/fnal-numerics/global-optimizer-gpu.

LGNov 1, 2023
Performance Optimization of Deep Learning Sparse Matrix Kernels on Intel Max Series GPU

Mohammad Zubair, Christoph Bauinger

In this paper, we focus on three sparse matrix operations that are relevant for machine learning applications, namely, the sparse-dense matrix multiplication (SPMM), the sampled dense-dense matrix multiplication (SDDMM), and the composition of the SDDMM with SPMM, also termed as FusedMM. We develop optimized implementations for SPMM, SDDMM, and FusedMM operations utilizing Intel oneAPI's Explicit SIMD (ESIMD) SYCL extension API. In contrast to CUDA or SYCL, the ESIMD API enables the writing of explicitly vectorized kernel code. Sparse matrix algorithms implemented with the ESIMD API achieved performance close to the peak of the targeted Intel Data Center GPU. We compare our performance results to Intel's oneMKL library on Intel GPUs and to a recent CUDA implementation for the sparse matrix operations on NVIDIA's V100 GPU and demonstrate that our implementations for sparse matrix operations outperform either.