DCAILGApr 16, 2018

BigDL: A Distributed Deep Learning Framework for Big Data

arXiv:1804.05839v4108 citations
Originality Synthesis-oriented
AI Analysis

It addresses the challenge for industry users of building and managing deep learning applications on existing big data platforms, though it is incremental by extending Spark's capabilities.

The paper tackles the problem of integrating deep learning with big data processing by introducing BigDL, a distributed deep learning framework built on Apache Spark, enabling direct processing of production data in Hadoop/Spark clusters as part of end-to-end pipelines.

This paper presents BigDL (a distributed deep learning framework for Apache Spark), which has been used by a variety of users in the industry for building deep learning applications on production big data platforms. It allows deep learning applications to run on the Apache Hadoop/Spark cluster so as to directly process the production data, and as a part of the end-to-end data analysis pipeline for deployment and management. Unlike existing deep learning frameworks, BigDL implements distributed, data parallel training directly on top of the functional compute model (with copy-on-write and coarse-grained operations) of Spark. We also share real-world experience and "war stories" of users that have adopted BigDL to address their challenges(i.e., how to easily build end-to-end data analysis and deep learning pipelines for their production data).

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes