Loss as the Inconsistency of a Probabilistic Dependency Graph: Choose Your Model, Not Your Loss Function
This work offers a foundational perspective for machine learning practitioners by reframing loss selection as a modeling decision, potentially simplifying theoretical and applied work in areas like optimization and inference.
The paper tackles the problem of choosing loss functions by showing that many standard losses arise as the inconsistency of probabilistic dependency graphs (PDGs), providing a unified framework. It demonstrates that this approach justifies connections between regularizers and priors, captures statistical divergences, and simplifies variational inference by deriving objectives like the ELBO from modeling assumptions.
In a world blessed with a great diversity of loss functions, we argue that that choice between them is not a matter of taste or pragmatics, but of model. Probabilistic depencency graphs (PDGs) are probabilistic models that come equipped with a measure of "inconsistency". We prove that many standard loss functions arise as the inconsistency of a natural PDG describing the appropriate scenario, and use the same approach to justify a well-known connection between regularizers and priors. We also show that the PDG inconsistency captures a large class of statistical divergences, and detail benefits of thinking of them in this way, including an intuitive visual language for deriving inequalities between them. In variational inference, we find that the ELBO, a somewhat opaque objective for latent variable models, and variants of it arise for free out of uncontroversial modeling assumptions -- as do simple graphical proofs of their corresponding bounds. Finally, we observe that inconsistency becomes the log partition function (free energy) in the setting where PDGs are factor graphs.