DeepSpark: A Spark-Based Distributed Deep Learning Framework for Commodity Clusters
This work addresses the computational bottleneck in deep learning for researchers and practitioners using commodity clusters, though it appears incremental as it builds on existing frameworks like Spark and Caffe/TensorFlow.
The authors tackled the challenge of training deep neural networks on large-scale data by proposing DeepSpark, a distributed deep learning framework that integrates Apache Spark with Caffe/TensorFlow, achieving efficient parallel training through a lock-free asynchronous update scheme.
The increasing complexity of deep neural networks (DNNs) has made it challenging to exploit existing large-scale data processing pipelines for handling massive data and parameters involved in DNN training. Distributed computing platforms and GPGPU-based acceleration provide a mainstream solution to this computational challenge. In this paper, we propose DeepSpark, a distributed and parallel deep learning framework that exploits Apache Spark on commodity clusters. To support parallel operations, DeepSpark automatically distributes workloads and parameters to Caffe/Tensorflow-running nodes using Spark, and iteratively aggregates training results by a novel lock-free asynchronous variant of the popular elastic averaging stochastic gradient descent based update scheme, effectively complementing the synchronized processing capabilities of Spark. DeepSpark is an on-going project, and the current release is available at http://deepspark.snu.ac.kr.