LGCLMSPLOct 12, 2021

A Brief Introduction to Automatic Differentiation for Machine Learning

arXiv:2110.06209v2
Originality Synthesis-oriented
AI Analysis

This is an incremental introduction for practitioners in machine learning, explaining existing AD methods without proposing new advancements.

The report introduces automatic differentiation (AD) as a technique used in machine learning frameworks to calculate derivatives for gradient-based optimization, removing the burden from model designers, and provides descriptions, motivations, implementation approaches, and examples using TensorFlow and PyTorch.

Machine learning and neural network models in particular have been improving the state of the art performance on many artificial intelligence related tasks. Neural network models are typically implemented using frameworks that perform gradient based optimization methods to fit a model to a dataset. These frameworks use a technique of calculating derivatives called automatic differentiation (AD) which removes the burden of performing derivative calculations from the model designer. In this report we describe AD, its motivations, and different implementation approaches. We briefly describe dataflow programming as it relates to AD. Lastly, we present example programs that are implemented with Tensorflow and PyTorch, which are two commonly used AD frameworks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes