DC AI LGApr 16, 2018

BigDL: A Distributed Deep Learning Framework for Big Data

Jason Dai, Yiheng Wang, Xin Qiu, Ding Ding, Yao Zhang, Yanzhang Wang, Xianyan Jia, Cherry Zhang, Yan Wan, Zhichao Li, Jiao Wang, Shengsheng Huang

arXiv:1804.05839v415.2108 citationsHas Code

Originality Synthesis-oriented

AI Analysis

It addresses the challenge for industry users of building and managing deep learning applications on existing big data platforms, though it is incremental by extending Spark's capabilities.

The paper tackles the problem of integrating deep learning with big data processing by introducing BigDL, a distributed deep learning framework built on Apache Spark, enabling direct processing of production data in Hadoop/Spark clusters as part of end-to-end pipelines.

This paper presents BigDL (a distributed deep learning framework for Apache Spark), which has been used by a variety of users in the industry for building deep learning applications on production big data platforms. It allows deep learning applications to run on the Apache Hadoop/Spark cluster so as to directly process the production data, and as a part of the end-to-end data analysis pipeline for deployment and management. Unlike existing deep learning frameworks, BigDL implements distributed, data parallel training directly on top of the functional compute model (with copy-on-write and coarse-grained operations) of Spark. We also share real-world experience and "war stories" of users that have adopted BigDL to address their challenges(i.e., how to easily build end-to-end data analysis and deep learning pipelines for their production data).

View on arXiv PDF Code

Similar