LGDec 21, 2022
A Non-Asymptotic Analysis of Oversmoothing in Graph Neural NetworksXinyi Wu, Zhengdao Chen, William Wang et al.
Oversmoothing is a central challenge of building more powerful Graph Neural Networks (GNNs). While previous works have only demonstrated that oversmoothing is inevitable when the number of graph convolutions tends to infinity, in this paper, we precisely characterize the mechanism behind the phenomenon via a non-asymptotic analysis. Specifically, we distinguish between two different effects when applying graph convolutions -- an undesirable mixing effect that homogenizes node representations in different classes, and a desirable denoising effect that homogenizes node representations in the same class. By quantifying these two effects on random graphs sampled from the Contextual Stochastic Block Model (CSBM), we show that oversmoothing happens once the mixing effect starts to dominate the denoising effect, and the number of layers required for this transition is $O(\log N/\log (\log N))$ for sufficiently dense graphs with $N$ nodes. We also extend our analysis to study the effects of Personalized PageRank (PPR), or equivalently, the effects of initial residual connections on oversmoothing. Our results suggest that while PPR mitigates oversmoothing at deeper layers, PPR-based architectures still achieve their best performance at a shallow depth and are outperformed by the graph convolution approach on certain graphs. Finally, we support our theoretical results with numerical experiments, which further suggest that the oversmoothing phenomenon observed in practice can be magnified by the difficulty of optimizing deep GNN models.
LGApr 22, 2022
On Feature Learning in Neural Networks with Global Convergence GuaranteesZhengdao Chen, Eric Vanden-Eijnden, Joan Bruna
We study the optimization of wide neural networks (NNs) via gradient flow (GF) in setups that allow feature learning while admitting non-asymptotic global convergence guarantees. First, for wide shallow NNs under the mean-field scaling and with a general class of activation functions, we prove that when the input dimension is no less than the size of the training set, the training loss converges to zero at a linear rate under GF. Building upon this analysis, we study a model of wide multi-layer NNs whose second-to-last layer is trained via GF, for which we also prove a linear-rate convergence of the training loss to zero, but regardless of the input dimension. We also show empirically that, unlike in the Neural Tangent Kernel (NTK) regime, our multi-layer model exhibits feature learning and can achieve better generalization performance than its NTK counterpart.
LGOct 28, 2022
A Functional-Space Mean-Field Theory of Partially-Trained Three-Layer Neural NetworksZhengdao Chen, Eric Vanden-Eijnden, Joan Bruna
To understand the training dynamics of neural networks (NNs), prior studies have considered the infinite-width mean-field (MF) limit of two-layer NN, establishing theoretical guarantees of its convergence under gradient flow training as well as its approximation and generalization capabilities. In this work, we study the infinite-width limit of a type of three-layer NN model whose first layer is random and fixed. To define the limiting model rigorously, we generalize the MF theory of two-layer NNs by treating the neurons as belonging to functional spaces. Then, by writing the MF training dynamics as a kernel gradient flow with a time-varying kernel that remains positive-definite, we prove that its training loss in $L_2$ regression decays to zero at a linear rate. Furthermore, we define function spaces that include the solutions obtainable through the MF training dynamics and prove Rademacher complexity bounds for these spaces. Our theory accommodates different scaling choices of the model, resulting in two regimes of the MF limit that demonstrate distinctive behaviors while both exhibiting feature learning.
LGJul 3, 2023
Neural Hilbert Ladders: Multi-Layer Neural Networks in Function SpaceZhengdao Chen
To characterize the function space explored by neural networks (NNs) is an important aspect of learning theory. In this work, noticing that a multi-layer NN generates implicitly a hierarchy of reproducing kernel Hilbert spaces (RKHSs) - named a neural Hilbert ladder (NHL) - we define the function space as an infinite union of RKHSs, which generalizes the existing Barron space theory of two-layer NNs. We then establish several theoretical properties of the new space. First, we prove a correspondence between functions expressed by L-layer NNs and those belonging to L-level NHLs. Second, we prove generalization guarantees for learning an NHL with a controlled complexity measure. Third, we derive a non-Markovian dynamics of random fields that governs the evolution of the NHL which is induced by the training of multi-layer NNs in an infinite-width mean-field limit. Fourth, we show examples of depth separation in NHLs under the ReLU activation function. Finally, we perform numerical experiments to illustrate the feature learning aspect of NN training through the lens of NHLs.
LGFeb 26, 2025
On the Interpolation Effect of Score Smoothing in Diffusion ModelsZhengdao Chen
Score-based diffusion models have achieved remarkable progress in various domains with the ability to generate new data samples that do not exist in the training set. In this work, we study the hypothesis that such creativity arises from an interpolation effect caused by a smoothing of the empirical score function. Focusing on settings where the training set lies uniformly in a one-dimensional subspace, we show theoretically how regularized two-layer ReLU neural networks tend to learn approximately a smoothed version of the empirical score function, and further probe the interplay between score smoothing and the denoising dynamics with analytical solutions and numerical experiments. In particular, we demonstrate how a smoothed score function can lead to the generation of samples that interpolate the training data along their subspace while avoiding full memorization. Moreover, we present experimental evidence that learning score functions with neural networks indeed induces a score smoothing effect, including in simple nonlinear settings and without explicit regularization.
LGOct 28, 2020
On Graph Neural Networks versus Graph-Augmented MLPsLei Chen, Zhengdao Chen, Joan Bruna
From the perspective of expressive power, this work compares multi-layer Graph Neural Networks (GNNs) with a simplified alternative that we call Graph-Augmented Multi-Layer Perceptrons (GA-MLPs), which first augments node features with certain multi-hop operators on the graph and then applies an MLP in a node-wise fashion. From the perspective of graph isomorphism testing, we show both theoretically and numerically that GA-MLPs with suitable operators can distinguish almost all non-isomorphic graphs, just like the Weifeiler-Lehman (WL) test. However, by viewing them as node-level functions and examining the equivalence classes they induce on rooted graphs, we prove a separation in expressive power between GA-MLPs and GNNs that grows exponentially in depth. In particular, unlike GNNs, GA-MLPs are unable to count the number of attributed walks. We also demonstrate via community detection experiments that GA-MLPs can be limited by their choice of operator family, as compared to GNNs with higher flexibility in learning.
PRAug 21, 2020
A Dynamical Central Limit Theorem for Shallow Neural NetworksZhengdao Chen, Grant M. Rotskoff, Joan Bruna et al.
Recent theoretical works have characterized the dynamics of wide shallow neural networks trained via gradient descent in an asymptotic mean-field limit when the width tends towards infinity. At initialization, the random sampling of the parameters leads to deviations from the mean-field limit dictated by the classical Central Limit Theorem (CLT). However, since gradient descent induces correlations among the parameters, it is of interest to analyze how these fluctuations evolve. Here, we use a dynamical CLT to prove that the asymptotic fluctuations around the mean limit remain bounded in mean square throughout training. The upper bound is given by a Monte-Carlo resampling error, with a variance that that depends on the 2-norm of the underlying measure, which also controls the generalization error. This motivates the use of this 2-norm as a regularization term during training. Furthermore, if the mean-field dynamics converges to a measure that interpolates the training data, we prove that the asymptotic deviation eventually vanishes in the CLT scaling. We also complement these results with numerical experiments.
LGFeb 10, 2020
Can Graph Neural Networks Count Substructures?Zhengdao Chen, Lei Chen, Soledad Villar et al.
The ability to detect and count certain substructures in graphs is important for solving many tasks on graph-structured data, especially in the contexts of computational chemistry and biology as well as social network analysis. Inspired by this, we propose to study the expressive power of graph neural networks (GNNs) via their ability to count attributed graph substructures, extending recent works that examine their power in graph isomorphism testing and function approximation. We distinguish between two types of substructure counting: induced-subgraph-count and subgraph-count, and establish both positive and negative answers for popular GNN architectures. Specifically, we prove that Message Passing Neural Networks (MPNNs), 2-Weisfeiler-Lehman (2-WL) and 2-Invariant Graph Networks (2-IGNs) cannot perform induced-subgraph-count of substructures consisting of 3 or more nodes, while they can perform subgraph-count of star-shaped substructures. As an intermediary step, we prove that 2-WL and 2-IGNs are equivalent in distinguishing non-isomorphic graphs, partly answering an open problem raised in Maron et al. (2019). We also prove positive results for k-WL and k-IGNs as well as negative results for k-WL with a finite number of iterations. We then conduct experiments that support the theoretical results for MPNNs and 2-IGNs. Moreover, motivated by substructure counting and inspired by Murphy et al. (2019), we propose the Local Relational Pooling model and demonstrate that it is not only effective for substructure counting but also able to achieve competitive performance on molecular prediction tasks.
LGSep 29, 2019
Symplectic Recurrent Neural NetworksZhengdao Chen, Jianyu Zhang, Martin Arjovsky et al.
We propose Symplectic Recurrent Neural Networks (SRNNs) as learning algorithms that capture the dynamics of physical systems from observed trajectories. An SRNN models the Hamiltonian function of the system by a neural network and furthermore leverages symplectic integration, multiple-step training and initial state optimization to address the challenging numerical issues associated with Hamiltonian systems. We show that SRNNs succeed reliably on complex and noisy Hamiltonian systems. We also show how to augment the SRNN integration scheme in order to handle stiff dynamical systems such as bouncing billiards.
LGMay 29, 2019
On the equivalence between graph isomorphism testing and function approximation with GNNsZhengdao Chen, Soledad Villar, Lei Chen et al.
Graph Neural Networks (GNNs) have achieved much success on graph-structured data. In light of this, there have been increasing interests in studying their expressive power. One line of work studies the capability of GNNs to approximate permutation-invariant functions on graphs, and another focuses on the their power as tests for graph isomorphism. Our work connects these two perspectives and proves their equivalence. We further develop a framework of the expressive power of GNNs that incorporates both of these viewpoints using the language of sigma-algebra, through which we compare the expressive power of different types of GNNs together with other graph isomorphism tests. In particular, we prove that the second-order Invariant Graph Network fails to distinguish non-isomorphic regular graphs with the same degree. Then, we extend it to a new architecture, Ring-GNN, which succeeds in distinguishing these graphs and achieves good performances on real-world datasets.
MLMay 23, 2017
Supervised Community Detection with Line Graph Neural NetworksZhengdao Chen, Xiang Li, Joan Bruna
Traditionally, community detection in graphs can be solved using spectral methods or posterior inference under probabilistic graphical models. Focusing on random graph families such as the stochastic block model, recent research has unified both approaches and identified both statistical and computational detection thresholds in terms of the signal-to-noise ratio. By recasting community detection as a node-wise classification problem on graphs, we can also study it from a learning perspective. We present a novel family of Graph Neural Networks (GNNs) for solving community detection problems in a supervised learning setting. We show that, in a data-driven manner and without access to the underlying generative models, they can match or even surpass the performance of the belief propagation algorithm on binary and multi-class stochastic block models, which is believed to reach the computational threshold. In particular, we propose to augment GNNs with the non-backtracking operator defined on the line graph of edge adjacencies. Our models also achieve good performance on real-world datasets. In addition, we perform the first analysis of the optimization landscape of training linear GNNs for community detection problems, demonstrating that under certain simplifications and assumptions, the loss values at local and global minima are not far apart.