Irina Babayan

LG
h-index2
3papers
3citations
Novelty63%
AI Score43

3 Papers

21.9LGJun 4
Causal Atlases from Entropic Inference: Bayesian Networks beyond Optimal DAGs

Hazhir Aliahmadi, Irina Babayan, Greg van Anders

Data-driven causal relationship identification is pertinent to advancing understanding of complex systems both within and beyond science. Bayesian networks offer a probabilistic method for modelling generic causal relationships via directed acyclic graphs (DAGs). However, typical techniques for constructing Bayesian networks rely on optimization, which can be ill-suited for learning causal relationships because the underlying data may admit multiple chains of causation. More data-faithful representations of causal relationships would provide frameworks for constructing multiple causal maps that are consistent with the variability that is inherent in underlying data. Here, we show that entropy-based inference generates atlases of plausible causal relationships that are consistent with underlying data. On simulated noisy data of 2- and 20-node linear structural equation models, we sample a maximum-entropy ensemble of graphs that allow us to quantify the inherent structural ambiguity in underlying causal relationships. Our method shows that "optimized" DAGs can contain causal artifacts are not consistent across equivalently accurate topologies.

16.1LGMay 15
Entropic Auto-Encoding via Implicit Free-Energy Minimization

Hazhir Aliahmadi, Irina Babayan, Greg van Anders

Despite their ubiquity, variational autoencoders (VAEs) inherently suffer from posterior collapse, a failure mode in which latent variables are effectively ignored. This failure arises because explicit prior imposition drives optimization toward loss landscape regions corresponding to uninformative latent representations. Here, we introduce Entropic Autoencoders (EAEs), a framework in which reconstruction loss is the only explicit objective, and entropy generates the latent variables' prior implicitly through a free energy-minimizing ensemble of encoders. This ensemble biases learning toward high-volume regions of near-optimal solutions, while decoder updates direct the search trajectories toward informative latent representations. We demonstrate that EAEs mitigate posterior collapse by learning non-Gaussian, multimodal latent distributions that yield diverse, data-consistent generations and preserve different forms of underlying structure in the data. As a proof-of-concept, we show that an EAE captures a superposition of the known low-dimensional dynamics of a reaction-diffusion process. Then, we show that an EAE identifies implicit categorical distinctions in MNIST latent representations, and displays a hierarchical understanding of facial structure on the CelebA dataset, from an "all-human" face to individual-dependent features.

LGOct 25, 2024
Simmering: Sufficient is better than optimal for training neural networks

Irina Babayan, Hazhir Aliahmadi, Greg van Anders

The broad range of neural network training techniques that invoke optimization but rely on ad hoc modification for validity suggests that optimization-based training is misguided. Shortcomings of optimization-based training are brought to particularly strong relief by the problem of overfitting, where naive optimization produces spurious outcomes. The broad success of neural networks for modelling physical processes has prompted advances that are based on inverting the direction of investigation and treating neural networks as if they were physical systems in their own right. These successes raise the question of whether broader, physical perspectives could motivate the construction of improved training algorithms. Here, we introduce simmering, a physics-based method that trains neural networks to generate weights and biases that are merely ``good enough'', but which, paradoxically, outperforms leading optimization-based approaches. Using classification and regression examples we show that simmering corrects neural networks that are overfit by Adam, and show that simmering avoids overfitting if deployed from the outset. Our results question optimization as a paradigm for neural network training, and leverage information-geometric arguments to point to the existence of classes of sufficient training algorithms that do not take optimization as their starting point.