LGJun 13, 2023
Discrete Graph Auto-EncoderYoann Boget, Magda Gregorova, Alexandros Kalousis
Despite advances in generative methods, accurately modeling the distribution of graphs remains a challenging task primarily because of the absence of predefined or inherent unique graph representation. Two main strategies have emerged to tackle this issue: 1) restricting the number of possible representations by sorting the nodes, or 2) using permutation-invariant/equivariant functions, specifically Graph Neural Networks (GNNs). In this paper, we introduce a new framework named Discrete Graph Auto-Encoder (DGAE), which leverages the strengths of both strategies and mitigate their respective limitations. In essence, we propose a strategy in 2 steps. We first use a permutation-equivariant auto-encoder to convert graphs into sets of discrete latent node representations, each node being represented by a sequence of quantized vectors. In the second step, we sort the sets of discrete latent representations and learn their distribution with a specifically designed auto-regressive model based on the Transformer architecture. Through multiple experimental evaluations, we demonstrate the competitive performances of our model in comparison to the existing state-of-the-art across various datasets. Various ablation studies support the interest of our method.
LGDec 1, 2022
GrannGAN: Graph annotation generative adversarial networksYoann Boget, Magda Gregorova, Alexandros Kalousis
We consider the problem of modelling high-dimensional distributions and generating new examples of data with complex relational feature structure coherent with a graph skeleton. The model we propose tackles the problem of generating the data features constrained by the specific graph structure of each data point by splitting the task into two phases. In the first it models the distribution of features associated with the nodes of the given graph, in the second it complements the edge features conditionally on the node features. We follow the strategy of implicit distribution modelling via generative adversarial network (GAN) combined with permutation equivariant message passing architecture operating over the sets of nodes and edges. This enables generating the feature vectors of all the graph objects in one go (in 2 phases) as opposed to a much slower one-by-one generations of sequential models, prevents the need for expensive graph matching procedures usually needed for likelihood-based generative models, and uses efficiently the network capacity by being insensitive to the particular node ordering in the graph representation. To the best of our knowledge, this is the first method that models the feature distribution along the graph skeleton allowing for generations of annotated graphs with user specified structures. Our experiments demonstrate the ability of our model to learn complex structured distributions through quantitative evaluation over three annotated graph datasets.
LGMar 30
Unrestrained Simplex Denoising for Discrete Data. A Non-Markovian Approach Applied to Graph GenerationYoann Boget, Alexandros Kalousis
Denoising models such as Diffusion or Flow Matching have recently advanced generative modeling for discrete structures, yet most approaches either operate directly in the discrete state space, causing abrupt state changes. We introduce simplex denoising, a simple yet effective generative framework that operates on the probability simplex. The key idea is a non-Markovian noising scheme in which, for a given clean data point, noisy representations at different times are conditionally independent. While preserving the theoretical guarantees of denoising-based generative models, our method removes unnecessary constraints, thereby improving performance and simplifying the formulation. Empirically, \emph{unrestrained simplex denoising} surpasses strong discrete diffusion and flow-matching baselines across synthetic and real-world graph benchmarks. These results highlight the probability simplex as an effective framework for discrete generative modeling.
LGMar 25, 2024
GLAD: Improving Latent Graph Generative Modeling with Simple QuantizationVan Khoa Nguyen, Yoann Boget, Frantzeska Lavda et al.
Learning graph generative models over latent spaces has received less attention compared to models that operate on the original data space and has so far demonstrated lacklustre performance. We present GLAD a latent space graph generative model. Unlike most previous latent space graph generative models, GLAD operates on a discrete latent space that preserves to a significant extent the discrete nature of the graph structures making no unnatural assumptions such as latent space continuity. We learn the prior of our discrete latent space by adapting diffusion bridges to its structure. By operating over an appropriately constructed latent space we avoid relying on decompositions that are often used in models that operate in the original data space. We present experiments on a series of graph benchmark datasets that demonstrates GLAD as the first equivariant latent graph generative method achieves competitive performance with the state of the art baselines.
LGMar 31
Hierarchical Discrete Flow Matching for Graph GenerationYoann Boget, Pablo Strasser, Alexandros Kalousis
Denoising-based models, including diffusion and flow matching, have led to substantial advances in graph generation. Despite this progress, such models remain constrained by two fundamental limitations: a computational cost that scales quadratically with the number of nodes and a large number of function evaluations required during generation. In this work, we introduce a novel hierarchical generative framework that reduces the number of node pairs that must be evaluated and adopts discrete flow matching to significantly decrease the number of denoising iterations. We empirically demonstrate that our approach more effectively captures graph distributions while substantially reducing generation time.
LGMar 27, 2025
Simple and Critical Iterative Denoising: A Recasting of Discrete Diffusion in Graph GenerationYoann Boget
Discrete Diffusion and Flow Matching models have significantly advanced generative modeling for discrete structures, including graphs. However, the dependencies between intermediate noisy states lead to error accumulation and propagation during the reverse denoising process - a phenomenon known as compounding denoising errors. To address this problem, we propose a novel framework called Simple Iterative Denoising, which simplifies discrete diffusion and circumvents the issue by assuming conditional independence between intermediate states. Additionally, we enhance our model by incorporating a Critic. During generation, the Critic selectively retains or corrupts elements in an instance based on their likelihood under the data distribution. Our empirical evaluations demonstrate that the proposed method significantly outperforms existing discrete diffusion baselines in graph generation tasks.
LGDec 7, 2021
Permutation Equivariant Generative Adversarial Networks for GraphsYoann Boget, Magda Gregorova, Alexandros Kalousis
One of the most discussed issues in graph generative modeling is the ordering of the representation. One solution consists of using equivariant generative functions, which ensure the ordering invariance. After having discussed some properties of such functions, we propose 3G-GAN, a 3-stages model relying on GANs and equivariant functions. The model is still under development. However, we present some encouraging exploratory experiments and discuss the issues still to be addressed.
MLOct 18, 2019
Adversarial Regression. Generative Adversarial Networks for Non-Linear Regression: Theory and AssessmentYoann Boget
Adversarial Regression is a proposition to perform high dimensional non-linear regression with uncertainty estimation. We used Conditional Generative Adversarial Network to obtain an estimate of the full predictive distribution for a new observation. Generative Adversarial Networks (GAN) are implicit generative models which produce samples from a distribution approximating the distribution of the data. The conditional version of it (CGAN) takes the following expression: $\min\limits_G \max\limits_D V(D, G) = \mathbb{E}_{x\sim p_{r}(x)} [log(D(x, y))] + \mathbb{E}_{z\sim p_{z}(z)} [log (1-D(G(z, y)))]$. An approximate solution can be found by training simultaneously two neural networks to model D and G and feeding G with a random noise vector $z$. After training, we have that $G(z, y)\mathrel{\dot\sim} p_{data}(x, y)$. By fixing $y$, we have $G(z|y) \mathrel{\dot\sim} p{data}(x|y)$. By sampling $z$, we can therefore obtain samples following approximately $p(x|y)$, which is the predictive distribution of $x$ for a new $y$. We ran experiments to test various loss functions, data distributions, sample size, size of the noise vector, etc. Even if we observed differences, no experiment outperformed consistently the others. The quality of CGAN for regression relies on fine-tuning a range of hyperparameters. In a broader view, the results show that CGANs are very promising methods to perform uncertainty estimation for high dimensional non-linear regression.