LGMLJun 5, 2024

Embarrassingly Parallel GFlowNets

arXiv:2406.03288v11 citations
Originality Incremental advance
AI Analysis

This addresses scalability issues in GFlowNet applications for researchers and practitioners dealing with distributed or large datasets, though it is incremental as it builds on existing GFlowNet frameworks.

The paper tackles the challenge of training GFlowNets for large-scale or distributed data by proposing EP-GFlowNet, a divide-and-conquer method that reduces computational cost and communication overhead, achieving efficient sampling in tasks like parallel Bayesian phylogenetics and federated structure learning.

GFlowNets are a promising alternative to MCMC sampling for discrete compositional random variables. Training GFlowNets requires repeated evaluations of the unnormalized target distribution or reward function. However, for large-scale posterior sampling, this may be prohibitive since it incurs traversing the data several times. Moreover, if the data are distributed across clients, employing standard GFlowNets leads to intensive client-server communication. To alleviate both these issues, we propose embarrassingly parallel GFlowNet (EP-GFlowNet). EP-GFlowNet is a provably correct divide-and-conquer method to sample from product distributions of the form $R(\cdot) \propto R_1(\cdot) ... R_N(\cdot)$ -- e.g., in parallel or federated Bayes, where each $R_n$ is a local posterior defined on a data partition. First, in parallel, we train a local GFlowNet targeting each $R_n$ and send the resulting models to the server. Then, the server learns a global GFlowNet by enforcing our newly proposed \emph{aggregating balance} condition, requiring a single communication step. Importantly, EP-GFlowNets can also be applied to multi-objective optimization and model reuse. Our experiments illustrate the EP-GFlowNets's effectiveness on many tasks, including parallel Bayesian phylogenetics, multi-objective multiset, sequence generation, and federated Bayesian structure learning.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes