Self Normalizing Flows
This addresses a core bottleneck in normalizing flow models for machine learning practitioners, offering a practical improvement for density estimation tasks.
The paper tackles the computational inefficiency of gradient computation for Jacobian determinants in normalizing flows by introducing Self Normalizing Flows, which use learned approximate inverses to reduce complexity from O(D^3) to O(D^2), enabling training of previously infeasible architectures and achieving similar data likelihood values with faster training and better performance than constrained models.
Efficient gradient computation of the Jacobian determinant term is a core problem in many machine learning settings, and especially so in the normalizing flow framework. Most proposed flow models therefore either restrict to a function class with easy evaluation of the Jacobian determinant, or an efficient estimator thereof. However, these restrictions limit the performance of such density models, frequently requiring significant depth to reach desired performance levels. In this work, we propose Self Normalizing Flows, a flexible framework for training normalizing flows by replacing expensive terms in the gradient by learned approximate inverses at each layer. This reduces the computational complexity of each layer's exact update from $\mathcal{O}(D^3)$ to $\mathcal{O}(D^2)$, allowing for the training of flow architectures which were otherwise computationally infeasible, while also providing efficient sampling. We show experimentally that such models are remarkably stable and optimize to similar data likelihood values as their exact gradient counterparts, while training more quickly and surpassing the performance of functionally constrained counterparts.