LGCVDec 1, 2017

Probabilistic Adaptive Computation Time

arXiv:1712.00386v18 citations
Originality Incremental advance
AI Analysis

This work addresses efficiency and resource management in deep learning for practitioners, though it is incremental as it builds on prior adaptive computation techniques.

The paper tackles the problem of adaptive computation time in deep learning models like ResNets and LSTMs by introducing a probabilistic model with discrete latent variables, achieving a speed-accuracy trade-off that matches existing methods while enabling deterministic evaluation with lower memory usage.

We present a probabilistic model with discrete latent variables that control the computation time in deep learning models such as ResNets and LSTMs. A prior on the latent variables expresses the preference for faster computation. The amount of computation for an input is determined via amortized maximum a posteriori (MAP) inference. MAP inference is performed using a novel stochastic variational optimization method. The recently proposed Adaptive Computation Time mechanism can be seen as an ad-hoc relaxation of this model. We demonstrate training using the general-purpose Concrete relaxation of discrete variables. Evaluation on ResNet shows that our method matches the speed-accuracy trade-off of Adaptive Computation Time, while allowing for evaluation with a simple deterministic procedure that has a lower memory footprint.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes