Matrix Calculus (for Machine Learning and Beyond)
It provides foundational knowledge for students in machine learning and related fields, but is incremental as it presents existing concepts in an educational format.
The paper introduces an undergraduate course on extending differential calculus to functions on vector spaces, focusing on practical applications in large-scale optimization and machine learning, and covers topics like adjoint differentiation and automatic differentiation techniques.
This course, intended for undergraduates familiar with elementary calculus and linear algebra, introduces the extension of differential calculus to functions on more general vector spaces, such as functions that take as input a matrix and return a matrix inverse or factorization, derivatives of ODE solutions, and even stochastic derivatives of random functions. It emphasizes practical computational applications, such as large-scale optimization and machine learning, where derivatives must be re-imagined in order to be propagated through complicated calculations. The class also discusses efficiency concerns leading to "adjoint" or "reverse-mode" differentiation (a.k.a. "backpropagation"), and gives a gentle introduction to modern automatic differentiation (AD) techniques.