DCLGApr 11, 2018

Flexible and Scalable Deep Learning with MMLSpark

arXiv:1804.04031v112 citationsHas Code
Originality Synthesis-oriented
AI Analysis

This work enables scalable deep learning for domain-specific applications like wildlife conservation, though it is incremental in combining existing tools.

The authors developed MMLSpark, an open-source library that integrates Cognitive Toolkit and Apache Spark for distributed deep learning, and applied it to classify Snow Leopards from camera trap images, providing an end-to-end solution for conservation efforts.

In this work we detail a novel open source library, called MMLSpark, that combines the flexible deep learning library Cognitive Toolkit, with the distributed computing framework Apache Spark. To achieve this, we have contributed Java Language bindings to the Cognitive Toolkit, and added several new components to the Spark ecosystem. In addition, we also integrate the popular image processing library OpenCV with Spark, and present a tool for the automated generation of PySpark wrappers from any SparkML estimator and use this tool to expose all work to the PySpark ecosystem. Finally, we provide a large library of tools for working and developing within the Spark ecosystem. We apply this work to the automated classification of Snow Leopards from camera trap images, and provide an end to end solution for the non-profit conservation organization, the Snow Leopard Trust.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes