Bobak T. Kiani

h-index17

11papers

217citations

Novelty53%

AI Score47

Ranked #54,352 of 205,806 authors (top 26%)#12,258 in LG (top 29%)

11 Papers

LGJul 11, 2023Code

Self-Supervised Learning with Lie Symmetries for Partial Differential Equations

Grégoire Mialon, Quentin Garrido, Hannah Lawrence et al.

Machine learning for differential equations paves the way for computationally efficient alternatives to numerical solvers, with potentially broad impacts in science and engineering. Though current algorithms typically require simulated training data tailored to a given setting, one may instead wish to learn useful information from heterogeneous sources, or from real dynamical systems observations that are messy or incomplete. In this work, we learn general-purpose representations of PDEs from heterogeneous data by implementing joint embedding methods for self-supervised learning (SSL), a framework for unsupervised representation learning that has had notable success in computer vision. Our representation outperforms baseline approaches to invariant tasks, such as regressing the coefficients of a PDE, while also improving the time-stepping performance of neural solvers. We hope that our proposed methodology will prove useful in the eventual development of general-purpose foundation models for PDEs. Code: https://github.com/facebookresearch/SSLForPDEs.

LGFeb 22, 2023

Equivariant Polynomials for Graph Neural Networks

Omri Puny, Derek Lim, Bobak T. Kiani et al. · mit, nvidia

Graph Neural Networks (GNN) are inherently limited in their expressive power. Recent seminal works (Xu et al., 2019; Morris et al., 2019b) introduced the Weisfeiler-Lehman (WL) hierarchy as a measure of expressive power. Although this hierarchy has propelled significant advances in GNN analysis and architecture developments, it suffers from several significant limitations. These include a complex definition that lacks direct guidance for model improvement and a WL hierarchy that is too coarse to study current GNNs. This paper introduces an alternative expressive power hierarchy based on the ability of GNNs to calculate equivariant polynomials of a certain degree. As a first step, we provide a full characterization of all equivariant graph polynomials by introducing a concrete basis, significantly generalizing previous results. Each basis element corresponds to a specific multi-graph, and its computation over some graph data input corresponds to a tensor contraction problem. Second, we propose algorithmic tools for evaluating the expressiveness of GNNs using tensor contraction sequences, and calculate the expressive power of popular GNNs. Finally, we enhance the expressivity of common GNN architectures by adding polynomial features or additional operations / aggregations inspired by our theory. These enhanced GNNs demonstrate state-of-the-art results in experiments across multiple graph learning benchmarks.

MLFeb 6, 2023

The SSL Interplay: Augmentations, Inductive Bias, and Generalization

Vivien Cabannes, Bobak T. Kiani, Randall Balestriero et al.

Self-supervised learning (SSL) has emerged as a powerful framework to learn representations from raw data without supervision. Yet in practice, engineers face issues such as instability in tuning optimizers and collapse of representations during training. Such challenges motivate the need for a theory to shed light on the complex interplay between the choice of data augmentation, network architecture, and training algorithm. We study such an interplay with a precise analysis of generalization performance on both pretraining and downstream tasks in a theory friendly setup, and highlight several insights for SSL practitioners that arise from our theory.

LGSep 29, 2022

Joint Embedding Self-Supervised Learning in the Kernel Regime

Bobak T. Kiani, Randall Balestriero, Yubei Chen et al.

The fundamental goal of self-supervised learning (SSL) is to produce useful representations of data without access to any labels for classifying the data. Modern methods in SSL, which form representations based on known or constructed relationships between samples, have been particularly effective at this task. Here, we aim to extend this framework to incorporate algorithms based on kernel methods where embeddings are constructed by linear maps acting on the feature space of a kernel. In this kernel regime, we derive methods to find the optimal form of the output representations for contrastive and non-contrastive loss functions. This procedure produces a new representation space with an inner product denoted as the induced kernel which generally correlates points which are related by an augmentation in kernel space and de-correlates points otherwise. We analyze our kernel model on small datasets to identify common features of self-supervised learning algorithms and gain theoretical insights into their performance on downstream tasks.

93.1QUANT-PHApr 22

SYK thermal expectations are classically easy at any temperature

Alexander Zlokapa, Bobak T. Kiani

Estimating thermal expectations of local observables is a natural target for quantum advantage. We give a simple classical algorithm that approximates thermal expectations for Gibbs states of local Hamiltonians, and we show it has quasi-polynomial cost $n^{O(\log (n/ε))}$ for all temperatures above a phase transition in the free energy. For many natural models, this coincides with the entire fast-mixing, quantumly easy phase. Our results apply to the Sachdev-Ye-Kitaev (SYK) model at any constant temperature due to its absence of a phase transition -- despite its entanglement, sign problem, and polynomial quantum circuit lower bounds. Beyond SYK, we rigorously establish a universal classically easy high-temperature phase for all local, bounded-degree Hamiltonians and show that it extends to temperatures strictly colder than the death of entanglement transition.

QUANT-PHMar 7

Optimizing Sparse SYK

Matthew Ding, Robbie King, Bobak T. Kiani et al.

Finding the ground state of strongly-interacting fermionic systems is often the prerequisite for fully understanding both quantum chemistry and condensed matter systems. The Sachdev--Ye--Kitaev (SYK) model is a representative example of such a system; it is particularly interesting not only due to the existence of efficient quantum algorithms preparing approximations to the ground state such as Hastings--O'Donnell (STOC 2022), but also known no-go results for many classical ansatzes in preparing low-energy states. However, this quantum-classical separation is known to \emph{not} persist when the SYK model is sufficiently sparsified, i.e., when terms in the model are discarded with probability $1-p$, where $p=Î(1/n^3)$ and $n$ is the system size. This raises the question of how robust the quantum and classical complexities of the SYK model are to sparsification. In this work we initiate the study of the sparse SYK model where $p \in [Î(1/n^3),1]$ and show there indeed exists a certain robustness of sparsification. We prove that with high probability, Gaussian states achieve only a $Î(1/\sqrt{n})$-factor approximation to the true ground state energy of sparse SYK for all $p\geqÎ©(\log n/n^2)$, and that Gaussian states cannot achieve constant-factor approximations unless $p \leq O(\log^2 n/n^3)$. Additionally, we prove that the quantum algorithm of Hastings--O'Donnell still achieves a constant-factor approximation to the ground state energy when $p\geqÎ©(\log n/n)$. Combined, these show a provable separation between classical algorithms outputting Gaussian states and efficient quantum algorithms for the goal of finding approximate sparse SYK ground states whenever $p \geq Î©(\log n/n)$, extending the analogous $p=1$ result of Hastings--O'Donnell.

LGJan 3, 2024

On the hardness of learning under symmetries

Bobak T. Kiani, Thien Le, Hannah Lawrence et al.

We study the problem of learning equivariant neural networks via gradient descent. The incorporation of known symmetries ("equivariance") into neural nets has empirically improved the performance of learning pipelines, in domains ranging from biology to computer vision. However, a rich yet separate line of learning theoretic research has demonstrated that actually learning shallow, fully-connected (i.e. non-symmetric) networks has exponential complexity in the correlational statistical query (CSQ) model, a framework encompassing gradient descent. In this work, we ask: are known problem symmetries sufficient to alleviate the fundamental hardness of learning neural nets with gradient descent? We answer this question in the negative. In particular, we give lower bounds for shallow graph neural networks, convolutional networks, invariant polynomials, and frame-averaged networks for permutation subgroups, which all scale either superpolynomially or exponentially in the relevant input dimension. Therefore, in spite of the significant inductive bias imparted via symmetry, actually learning the complete classes of functions represented by equivariant neural networks via gradient descent remains hard.

LGJun 3, 2024

Hardness of Learning Neural Networks under the Manifold Hypothesis

Bobak T. Kiani, Jason Wang, Melanie Weber

The manifold hypothesis presumes that high-dimensional data lies on or near a low-dimensional manifold. While the utility of encoding geometric structure has been demonstrated empirically, rigorous analysis of its impact on the learnability of neural networks is largely missing. Several recent results have established hardness results for learning feedforward and equivariant neural networks under i.i.d. Gaussian or uniform Boolean data distributions. In this paper, we investigate the hardness of learning under the manifold hypothesis. We ask which minimal assumptions on the curvature and regularity of the manifold, if any, render the learning problem efficiently learnable. We prove that learning is hard under input manifolds of bounded curvature by extending proofs of hardness in the SQ and cryptographic settings for Boolean data inputs to the geometric setting. On the other hand, we show that additional assumptions on the volume of the data manifold alleviate these fundamental limitations and guarantee learnability via a simple interpolation argument. Notable instances of this regime are manifolds which can be reliably reconstructed via manifold learning. Looking forward, we comment on and empirically explore intermediate regimes of manifolds, which have heterogeneous features commonly found in real world data.

LGOct 12, 2021

Implicit Bias of Linear Equivariant Networks

Hannah Lawrence, Kristian Georgiev, Andrew Dienes et al.

Group equivariant convolutional neural networks (G-CNNs) are generalizations of convolutional neural networks (CNNs) which excel in a wide range of technical applications by explicitly encoding symmetries, such as rotations and permutations, in their architectures. Although the success of G-CNNs is driven by their \emph{explicit} symmetry bias, a recent line of work has proposed that the \emph{implicit} bias of training algorithms on particular architectures is key to understanding generalization for overparameterized neural nets. In this context, we show that $L$-layer full-width linear G-CNNs trained via gradient descent for binary classification converge to solutions with low-rank Fourier matrix coefficients, regularized by the $2/L$-Schatten matrix norm. Our work strictly generalizes previous analysis on the implicit bias of linear CNNs to linear G-CNNs over all finite groups, including the challenging setting of non-commutative groups (such as permutations), as well as band-limited G-CNNs over infinite groups. We validate our theorems via experiments on a variety of groups, and empirically explore more realistic nonlinear networks, which locally capture similar regularization patterns. Finally, we provide intuitive interpretations of our Fourier space implicit regularization results in real space via uncertainty principles.

QUANT-PHSep 23, 2021

Quantum algorithms for group convolution, cross-correlation, and equivariant transformations

Grecia Castelazo, Quynh T. Nguyen, Giacomo De Palma et al.

Group convolutions and cross-correlations, which are equivariant to the actions of group elements, are commonly used in mathematics to analyze or take advantage of symmetries inherent in a given problem setting. Here, we provide efficient quantum algorithms for performing linear group convolutions and cross-correlations on data stored as quantum states. Runtimes for our algorithms are logarithmic in the dimension of the group thus offering an exponential speedup compared to classical algorithms when input data is provided as a quantum state and linear operations are well conditioned. Motivated by the rich literature on quantum algorithms for solving algebraic problems, our theoretical framework opens a path for quantizing many algorithms in machine learning and numerical methods that employ group operations.

MLApr 13, 2020

Adversarial Robustness Guarantees for Random Deep Neural Networks

Giacomo De Palma, Bobak T. Kiani, Seth Lloyd

The reliability of deep learning algorithms is fundamentally challenged by the existence of adversarial examples, which are incorrectly classified inputs that are extremely close to a correctly classified input. We explore the properties of adversarial examples for deep neural networks with random weights and biases, and prove that for any $p\ge1$, the $\ell^p$ distance of any given input from the classification boundary scales as one over the square root of the dimension of the input times the $\ell^p$ norm of the input. The results are based on the recently proved equivalence between Gaussian processes and deep neural networks in the limit of infinite width of the hidden layers, and are validated with experiments on both random deep neural networks and deep neural networks trained on the MNIST and CIFAR10 datasets. The results constitute a fundamental advance in the theoretical understanding of adversarial examples, and open the way to a thorough theoretical characterization of the relation between network architecture and robustness to adversarial perturbations.