The Elements of Differentiable Programming
This foundational work addresses a broad problem in AI and computer science by providing a comprehensive framework for designing and optimizing differentiable programs, which is not incremental but establishes a new paradigm.
The paper tackles the challenge of enabling gradient-based optimization for complex computer programs with control flows and data structures by introducing differentiable programming as a new paradigm, which allows end-to-end differentiation and inherently introduces probability distributions to quantify uncertainty in program outputs.
Artificial intelligence has recently experienced remarkable advances, fueled by large models, vast datasets, accelerated hardware, and, last but not least, the transformative power of differentiable programming. This new programming paradigm enables end-to-end differentiation of complex computer programs (including those with control flows and data structures), making gradient-based optimization of program parameters possible. As an emerging paradigm, differentiable programming builds upon several areas of computer science and applied mathematics, including automatic differentiation, graphical models, optimization and statistics. This book presents a comprehensive review of the fundamental concepts useful for differentiable programming. We adopt two main perspectives, that of optimization and that of probability, with clear analogies between the two. Differentiable programming is not merely the differentiation of programs, but also the thoughtful design of programs intended for differentiation. By making programs differentiable, we inherently introduce probability distributions over their execution, providing a means to quantify the uncertainty associated with program outputs.