LG DCJan 2, 2023

GPU accelerated matrix factorization of large scale data using block based approach

arXiv:2304.13724v12.0h-index: 29

Originality Incremental advance

AI Analysis

This addresses computational bottlenecks for researchers and practitioners working with large-scale matrix factorization tasks, though it appears to be an incremental improvement on existing GPU acceleration techniques.

The authors tackled the problem of slow matrix factorization on large datasets by developing a block-based approach using stochastic gradient descent that enables GPU acceleration despite memory limitations. Their method achieved comparable RMSE results to CPU-based variants while providing significant speed improvements.

Matrix Factorization (MF) on large scale data takes substantial time on a Central Processing Unit (CPU). While Graphical Processing Unit (GPU)s could expedite the computation of MF, the available memory on a GPU is finite. Leveraging GPUs require alternative techniques that allow not only parallelism but also address memory limitations. Synchronization between computation units, isolation of data related to a computational unit, sharing of data between computational units and identification of independent tasks among computational units are some of the challenges while leveraging GPUs for MF. We propose a block based approach to matrix factorization using Stochastic Gradient Descent (SGD) that is aimed at accelerating MF on GPUs. The primary motivation for the approach is to make it viable to factorize extremely large data sets on limited hardware without having to compromise on results. The approach addresses factorization of large scale data by identifying independent blocks, each of which are factorized in parallel using multiple computational units. The approach can be extended to one or more GPUs and even to distributed systems. The RMSE results of the block based approach are with in acceptable delta in comparison to the results of CPU based variant and multi-threaded CPU variant of similar SGD kernel implementation. The advantage, of the block based variant, in-terms of speed are significant in comparison to other variants.

View on arXiv PDF

Similar