LGCVMLApr 16, 2015

Caffe con Troll: Shallow Ideas to Speed Up Deep Learning

arXiv:1504.04343v273 citations
Originality Synthesis-oriented
AI Analysis

This work addresses slow training times for deep learning practitioners using the Caffe framework, though it is incremental as it applies existing optimizations rather than introducing new methods.

The authors tackled the performance bottleneck in training convolutional neural networks by rebuilding the Caffe framework internals with standard batching optimizations for CPUs, achieving a 4.5x throughput improvement over Caffe on networks like CaffeNet and enabling efficient hybrid CPU-GPU training.

We present Caffe con Troll (CcT), a fully compatible end-to-end version of the popular framework Caffe with rebuilt internals. We built CcT to examine the performance characteristics of training and deploying general-purpose convolutional neural networks across different hardware architectures. We find that, by employing standard batching optimizations for CPU training, we achieve a 4.5x throughput improvement over Caffe on popular networks like CaffeNet. Moreover, with these improvements, the end-to-end training time for CNNs is directly proportional to the FLOPS delivered by the CPU, which enables us to efficiently train hybrid CPU-GPU systems for CNNs.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes