Chaitanya K. Joshi

LG
h-index56
22papers
2,809citations
Novelty42%
AI Score54

22 Papers

LGJul 17, 2023
Artificial Intelligence for Science in Quantum, Atomistic, and Continuum Systems

Xuan Zhang, Limei Wang, Jacob Helwig et al. · cambridge, mit

Advances in artificial intelligence (AI) are fueling a new paradigm of discoveries in natural sciences. Today, AI has started to advance natural sciences by improving, accelerating, and enabling our understanding of natural phenomena at a wide range of spatial and temporal scales, giving rise to a new area of research known as AI for science (AI4Science). Being an emerging research paradigm, AI4Science is unique in that it is an enormous and highly interdisciplinary area. Thus, a unified and technical treatment of this field is needed yet challenging. This work aims to provide a technically thorough account of a subarea of AI4Science; namely, AI for quantum, atomistic, and continuum systems. These areas aim at understanding the physical world from the subatomic (wavefunctions and electron density), atomic (molecules, proteins, materials, and interactions), to macro (fluids, climate, and subsurface) scales and form an important subarea of AI4Science. A unique advantage of focusing on these areas is that they largely share a common set of challenges, thereby allowing a unified and foundational treatment. A key common challenge is how to capture physics first principles, especially symmetries, in natural systems by deep learning methods. We provide an in-depth yet intuitive account of techniques to achieve equivariance to symmetry transformations. We also discuss other common technical challenges, including explainability, out-of-distribution generalization, knowledge transfer with foundation and large language models, and uncertainty quantification. To facilitate learning and education, we provide categorized lists of resources that we found to be useful. We strive to be thorough and unified and hope this initial effort may trigger more community interests and efforts to further advance AI4Science.

LGJan 23, 2023Code
On the Expressive Power of Geometric Graph Neural Networks

Chaitanya K. Joshi, Cristian Bodnar, Simon V. Mathis et al. · cambridge

The expressive power of Graph Neural Networks (GNNs) has been studied extensively through the Weisfeiler-Leman (WL) graph isomorphism test. However, standard GNNs and the WL framework are inapplicable for geometric graphs embedded in Euclidean space, such as biomolecules, materials, and other physical systems. In this work, we propose a geometric version of the WL test (GWL) for discriminating geometric graphs while respecting the underlying physical symmetries: permutations, rotation, reflection, and translation. We use GWL to characterise the expressive power of geometric GNNs that are invariant or equivariant to physical symmetries in terms of distinguishing geometric graphs. GWL unpacks how key design choices influence geometric GNN expressivity: (1) Invariant layers have limited expressivity as they cannot distinguish one-hop identical geometric graphs; (2) Equivariant layers distinguish a larger class of graphs by propagating geometric information beyond local neighbourhoods; (3) Higher order tensors and scalarisation enable maximally powerful geometric GNNs; and (4) GWL's discrimination-based perspective is equivalent to universal approximation. Synthetic experiments supplementing our results are available at \url{https://github.com/chaitjo/geometric-gnn-dojo}

LGDec 4, 2025Code
LeMat-GenBench: A Unified Evaluation Framework for Crystal Generative Models

Siddharth Betala, Samuel P. Gleason, Ali Ramlaoui et al.

Generative machine learning (ML) models hold great promise for accelerating materials discovery through the inverse design of inorganic crystals, enabling an unprecedented exploration of chemical space. Yet, the lack of standardized evaluation frameworks makes it challenging to evaluate, compare, and further develop these ML models meaningfully. In this work, we introduce LeMat-GenBench, a unified benchmark for generative models of crystalline materials, supported by a set of evaluation metrics designed to better inform model development and downstream applications. We release both an open-source evaluation suite and a public leaderboard on Hugging Face, and benchmark 12 recent generative models. Results reveal that an increase in stability leads to a decrease in novelty and diversity on average, with no model excelling across all dimensions. Altogether, LeMat-GenBench establishes a reproducible and extensible foundation for fair model comparison and aims to guide the development of more reliable, discovery-oriented generative models for crystalline materials.

LGMar 5, 2025Code
All-atom Diffusion Transformers: Unified generative modelling of molecules and materials

Chaitanya K. Joshi, Xiang Fu, Yi-Lun Liao et al. · baidu, cmu

Diffusion models are the standard toolkit for generative modelling of 3D atomic systems. However, for different types of atomic systems -- such as molecules and materials -- the generative processes are usually highly specific to the target system despite the underlying physics being the same. We introduce the All-atom Diffusion Transformer (ADiT), a unified latent diffusion framework for jointly generating both periodic materials and non-periodic molecular systems using the same model: (1) An autoencoder maps a unified, all-atom representations of molecules and materials to a shared latent embedding space; and (2) A diffusion model is trained to generate new latent embeddings that the autoencoder can decode to sample new molecules or materials. Experiments on MP20, QM9 and GEOM-DRUGS datasets demonstrate that jointly trained ADiT generates realistic and valid molecules as well as materials, obtaining state-of-the-art results on par with molecule and crystal-specific models. ADiT uses standard Transformers with minimal inductive biases for both the autoencoder and diffusion model, resulting in significant speedups during training and inference compared to equivariant diffusion models. Scaling ADiT up to half a billion parameters predictably improves performance, representing a step towards broadly generalizable foundation models for generative chemistry. Open source code: https://github.com/facebookresearch/all-atom-diffusion-transformer

LGDec 7, 2025
Multi-Scale Protein Structure Modelling with Geometric Graph U-Nets

Chang Liu, Vivian Li, Linus Leong et al. · cambridge

Geometric Graph Neural Networks (GNNs) and Transformers have become state-of-the-art for learning from 3D protein structures. However, their reliance on message passing prevents them from capturing the hierarchical interactions that govern protein function, such as global domains and long-range allosteric regulation. In this work, we argue that the network architecture itself should mirror this biological hierarchy. We introduce Geometric Graph U-Nets, a new class of models that learn multi-scale representations by recursively coarsening and refining the protein graph. We prove that this hierarchical design can theoretically more expressive than standard Geometric GNNs. Empirically, on the task of protein fold classification, Geometric U-Nets substantially outperform invariant and equivariant baselines, demonstrating their ability to learn the global structural patterns that define protein folds. Our work provides a principled foundation for designing geometric deep learning architectures that can learn the multi-scale structure of biomolecules.

LGFeb 17, 2025Code
Towards Mechanistic Interpretability of Graph Transformers via Attention Graphs

Batu El, Deepro Choudhury, Pietro Liò et al. · cambridge

We introduce Attention Graphs, a new tool for mechanistic interpretability of Graph Neural Networks (GNNs) and Graph Transformers based on the mathematical equivalence between message passing in GNNs and the self-attention mechanism in Transformers. Attention Graphs aggregate attention matrices across Transformer layers and heads to describe how information flows among input nodes. Through experiments on homophilous and heterophilous node classification tasks, we analyze Attention Graphs from a network science perspective and find that: (1) When Graph Transformers are allowed to learn the optimal graph structure using all-to-all attention among input nodes, the Attention Graphs learned by the model do not tend to correlate with the input/original graph structure; and (2) For heterophilous graphs, different Graph Transformer variants can achieve similar performance while utilising distinct information flow patterns. Open source code: https://github.com/batu-el/understanding-inductive-biases-of-gnns

LGJun 19, 2024Code
Evaluating representation learning on the protein structure universe

Arian R. Jamasb, Alex Morehead, Chaitanya K. Joshi et al.

We introduce ProteinWorkshop, a comprehensive benchmark suite for representation learning on protein structures with Geometric Graph Neural Networks. We consider large-scale pre-training and downstream tasks on both experimental and predicted structures to enable the systematic evaluation of the quality of the learned structural representation and their usefulness in capturing functional relationships for downstream tasks. We find that: (1) large-scale pretraining on AlphaFold structures and auxiliary tasks consistently improve the performance of both rotation-invariant and equivariant GNNs, and (2) more expressive equivariant GNNs benefit from pretraining to a greater extent compared to invariant models. We aim to establish a common ground for the machine learning and computational biology communities to rigorously compare and advance protein structure representation learning. Our open-source codebase reduces the barrier to entry for working with large protein structure datasets by providing: (1) storage-efficient dataloaders for large-scale structural databases including AlphaFoldDB and ESM Atlas, as well as (2) utilities for constructing new tasks from the entire PDB. ProteinWorkshop is available at: github.com/a-r-j/ProteinWorkshop.

BMJun 19, 2024Code
RNA-FrameFlow: Flow Matching for de novo 3D RNA Backbone Design

Rishabh Anand, Chaitanya K. Joshi, Alex Morehead et al.

We introduce RNA-FrameFlow, the first generative model for 3D RNA backbone design. We build upon SE(3) flow matching for protein backbone generation and establish protocols for data preparation and evaluation to address unique challenges posed by RNA modeling. We formulate RNA structures as a set of rigid-body frames and associated loss functions which account for larger, more conformationally flexible RNA backbones (13 atoms per nucleotide) vs. proteins (4 atoms per residue). Toward tackling the lack of diversity in 3D RNA datasets, we explore training with structural clustering and cropping augmentations. Additionally, we define a suite of evaluation metrics to measure whether the generated RNA structures are globally self-consistent (via inverse folding followed by forward folding) and locally recover RNA-specific structural descriptors. The most performant version of RNA-FrameFlow generates locally realistic RNA backbones of 40-150 nucleotides, over 40% of which pass our validity criteria as measured by a self-consistency TM-score >= 0.45, at which two RNAs have the same global fold. Open-source code: https://github.com/rish-16/rna-backbone-design

LGMay 24, 2023Code
gRNAde: Geometric Deep Learning for 3D RNA inverse design

Chaitanya K. Joshi, Arian R. Jamasb, Ramon Viñas et al.

Computational RNA design tasks are often posed as inverse problems, where sequences are designed based on adopting a single desired secondary structure without considering 3D conformational diversity. We introduce gRNAde, a geometric RNA design pipeline operating on 3D RNA backbones to design sequences that explicitly account for structure and dynamics. gRNAde uses a multi-state Graph Neural Network and autoregressive decoding to generates candidate RNA sequences conditioned on one or more 3D backbone structures where the identities of the bases are unknown. On a single-state fixed backbone re-design benchmark of 14 RNA structures from the PDB identified by Das et al. (2010), gRNAde obtains higher native sequence recovery rates (56% on average) compared to Rosetta (45% on average), taking under a second to produce designs compared to the reported hours for Rosetta. We further demonstrate the utility of gRNAde on a new benchmark of multi-state design for structurally flexible RNAs, as well as zero-shot ranking of mutational fitness landscapes in a retrospective analysis of a recent ribozyme. Experimental wet lab validation on 10 different structured RNA backbones finds that gRNAde has a success rate of 50% at designing pseudoknotted RNA structures, a significant advance over 35% for Rosetta. Open source code and tutorials are available at: https://github.com/chaitjo/geometric-rna-design

LGNov 9, 2021Code
On Representation Knowledge Distillation for Graph Neural Networks

Chaitanya K. Joshi, Fayao Liu, Xu Xun et al.

Knowledge distillation is a learning paradigm for boosting resource-efficient graph neural networks (GNNs) using more expressive yet cumbersome teacher models. Past work on distillation for GNNs proposed the Local Structure Preserving loss (LSP), which matches local structural relationships defined over edges across the student and teacher's node embeddings. This paper studies whether preserving the global topology of how the teacher embeds graph data can be a more effective distillation objective for GNNs, as real-world graphs often contain latent interactions and noisy edges. We propose Graph Contrastive Representation Distillation (G-CRD), which uses contrastive learning to implicitly preserve global topology by aligning the student node embeddings to those of the teacher in a shared representation space. Additionally, we introduce an expanded set of benchmarks on large-scale real-world datasets where the performance gap between teacher and student GNNs is non-negligible. Experiments across 4 datasets and 14 heterogeneous GNN architectures show that G-CRD consistently boosts the performance and robustness of lightweight GNNs, outperforming LSP (and a global structure preserving variant of LSP) as well as baselines from 2D computer vision. An analysis of the representational similarity among teacher and student embedding spaces reveals that G-CRD balances preserving local and global relationships, while structure preserving approaches are best at preserving one or the other. Our code is available at https://github.com/chaitjo/efficient-gnns

LGMar 2, 2020Code
Benchmarking Graph Neural Networks

Vijay Prakash Dwivedi, Chaitanya K. Joshi, Anh Tuan Luu et al.

In the last few years, graph neural networks (GNNs) have become the standard toolkit for analyzing and learning from data on graphs. This emerging field has witnessed an extensive growth of promising techniques that have been applied with success to computer science, mathematics, biology, physics and chemistry. But for any successful field to become mainstream and reliable, benchmarks must be developed to quantify progress. This led us in March 2020 to release a benchmark framework that i) comprises of a diverse collection of mathematical and real-world graphs, ii) enables fair model comparison with the same parameter budget to identify key architectures, iii) has an open-source, easy-to-use and reproducible code infrastructure, and iv) is flexible for researchers to experiment with new theoretical ideas. As of December 2022, the GitHub repository has reached 2,000 stars and 380 forks, which demonstrates the utility of the proposed open-source framework through the wide usage by the GNN community. In this paper, we present an updated version of our benchmark with a concise presentation of the aforementioned framework characteristics, an additional medium-sized molecular dataset AQSOL, similar to the popular ZINC, but with a real-world measured chemical target, and discuss how this framework can be leveraged to explore new GNN designs and insights. As a proof of value of our benchmark, we study the case of graph positional encoding (PE) in GNNs, which was introduced with this benchmark and has since spurred interest of exploring more powerful PE for Transformers and GNNs in a robust experimental setting.

CVDec 24, 2019Code
Multi-Graph Transformer for Free-Hand Sketch Recognition

Peng Xu, Chaitanya K. Joshi, Xavier Bresson

Learning meaningful representations of free-hand sketches remains a challenging task given the signal sparsity and the high-level abstraction of sketches. Existing techniques have focused on exploiting either the static nature of sketches with Convolutional Neural Networks (CNNs) or the temporal sequential property with Recurrent Neural Networks (RNNs). In this work, we propose a new representation of sketches as multiple sparsely connected graphs. We design a novel Graph Neural Network (GNN), the Multi-Graph Transformer (MGT), for learning representations of sketches from multiple graphs which simultaneously capture global and local geometric stroke structures, as well as temporal information. We report extensive numerical experiments on a sketch recognition task to demonstrate the performance of the proposed approach. Particularly, MGT applied on 414k sketches from Google QuickDraw: (i) achieves small recognition gap to the CNN-based performance upper bound (72.80% vs. 74.22%), and (ii) outperforms all RNN-based models by a significant margin. To the best of our knowledge, this is the first work proposing to represent sketches as graphs and apply GNNs for sketch recognition. Code and trained models are available at https://github.com/PengBoXiangShang/multigraph_transformer.

LGDec 12, 2023
A Hitchhiker's Guide to Geometric GNNs for 3D Atomic Systems

Alexandre Duval, Simon V. Mathis, Chaitanya K. Joshi et al. · cambridge

Recent advances in computational modelling of atomic systems, spanning molecules, proteins, and materials, represent them as geometric graphs with atoms embedded as nodes in 3D Euclidean space. In these graphs, the geometric attributes transform according to the inherent physical symmetries of 3D atomic systems, including rotations and translations in Euclidean space, as well as node permutations. In recent years, Geometric Graph Neural Networks have emerged as the preferred machine learning architecture powering applications ranging from protein structure prediction to molecular simulations and material generation. Their specificity lies in the inductive biases they leverage - such as physical symmetries and chemical properties - to learn informative representations of these geometric graphs. In this opinionated paper, we provide a comprehensive and self-contained overview of the field of Geometric GNNs for 3D atomic systems. We cover fundamental background material and introduce a pedagogical taxonomy of Geometric GNN architectures: (1) invariant networks, (2) equivariant networks in Cartesian basis, (3) equivariant networks in spherical basis, and (4) unconstrained networks. Additionally, we outline key datasets and application areas and suggest future research directions. The objective of this work is to present a structured perspective on the field, making it accessible to newcomers and aiding practitioners in gaining an intuition for its mathematical abstractions.

LGJun 27, 2025
Transformers are Graph Neural Networks

Chaitanya K. Joshi

We establish connections between the Transformer architecture, originally introduced for natural language processing, and Graph Neural Networks (GNNs) for representation learning on graphs. We show how Transformers can be viewed as message passing GNNs operating on fully connected graphs of tokens, where the self-attention mechanism capture the relative importance of all tokens w.r.t. each-other, and positional encodings provide hints about sequential ordering or structure. Thus, Transformers are expressive set processing networks that learn relationships among input elements without being constrained by apriori graphs. Despite this mathematical connection to GNNs, Transformers are implemented via dense matrix operations that are significantly more efficient on modern hardware than sparse message passing. This leads to the perspective that Transformers are GNNs currently winning the hardware lottery.

LGJul 29, 2025
Multi-state Protein Design with DynamicMPNN

Alex Abrudan, Sebastian Pujalte Ojeda, Chaitanya K. Joshi et al.

Structural biology has long been dominated by the one sequence, one structure, one function paradigm, yet many critical biological processes - from enzyme catalysis to membrane transport - depend on proteins that adopt multiple conformational states. Existing multi-state design approaches rely on post-hoc aggregation of single-state predictions, achieving poor experimental success rates compared to single-state design. We introduce DynamicMPNN, an inverse folding model explicitly trained to generate sequences compatible with multiple conformations through joint learning across conformational ensembles. Trained on 46,033 conformational pairs covering 75% of CATH superfamilies and evaluated using Alphafold 3, DynamicMPNN outperforms ProteinMPNN by up to 25% on decoy-normalized RMSD and by 12% on sequence recovery across our challenging multi-state protein benchmark.

LGMay 30, 2023
Group Invariant Global Pooling

Kamil Bujel, Yonatan Gideoni, Chaitanya K. Joshi et al.

Much work has been devoted to devising architectures that build group-equivariant representations, while invariance is often induced using simple global pooling mechanisms. Little work has been done on creating expressive layers that are invariant to given symmetries, despite the success of permutation invariant pooling in various molecular tasks. In this work, we present Group Invariant Global Pooling (GIGP), an invariant pooling layer that is provably sufficiently expressive to represent a large class of invariant functions. We validate GIGP on rotated MNIST and QM9, showing improvements for the latter while attaining identical results for the former. By making the pooling process group orbit-aware, this invariant aggregation method leads to improved performance, while performing well-principled group aggregation.

CVAug 4, 2021
Point Discriminative Learning for Data-efficient 3D Point Cloud Analysis

Fayao Liu, Guosheng Lin, Chuan-Sheng Foo et al.

3D point cloud analysis has drawn a lot of research attention due to its wide applications. However, collecting massive labelled 3D point cloud data is both time-consuming and labor-intensive. This calls for data-efficient learning methods. In this work we propose PointDisc, a point discriminative learning method to leverage self-supervisions for data-efficient 3D point cloud classification and segmentation. PointDisc imposes a novel point discrimination loss on the middle and global level features produced by the backbone network. This point discrimination loss enforces learned features to be consistent with points belonging to the corresponding local shape region and inconsistent with randomly sampled noisy points. We conduct extensive experiments on 3D object classification, 3D semantic and part segmentation, showing the benefits of PointDisc for data-efficient learning. Detailed analysis demonstrate that PointDisc learns unsupervised features that well capture local and global geometry.

LGJun 12, 2020
Learning the Travelling Salesperson Problem Requires Rethinking Generalization

Chaitanya K. Joshi, Quentin Cappart, Louis-Martin Rousseau et al.

End-to-end training of neural network solvers for graph combinatorial optimization problems such as the Travelling Salesperson Problem (TSP) have seen a surge of interest recently, but remain intractable and inefficient beyond graphs with few hundreds of nodes. While state-of-the-art learning-driven approaches for TSP perform closely to classical solvers when trained on trivially small sizes, they are unable to generalize the learnt policy to larger instances at practical scales. This work presents an end-to-end neural combinatorial optimization pipeline that unifies several recent papers in order to identify the inductive biases, model architectures and learning algorithms that promote generalization to instances larger than those seen in training. Our controlled experiments provide the first principled investigation into such zero-shot generalization, revealing that extrapolating beyond training data requires rethinking the neural combinatorial optimization pipeline, from network layers and learning paradigms to evaluation protocols. Additionally, we analyze recent advances in deep learning for routing problems through the lens of our pipeline and provide new directions to stimulate future research.

LGOct 16, 2019
On Learning Paradigms for the Travelling Salesman Problem

Chaitanya K. Joshi, Thomas Laurent, Xavier Bresson

We explore the impact of learning paradigms on training deep neural networks for the Travelling Salesman Problem. We design controlled experiments to train supervised learning (SL) and reinforcement learning (RL) models on fixed graph sizes up to 100 nodes, and evaluate them on variable sized graphs up to 500 nodes. Beyond not needing labelled data, our results reveal favorable properties of RL over SL: RL training leads to better emergent generalization to variable graph sizes and is a key component for learning scale-invariant solvers for novel combinatorial problems.

LGJun 4, 2019
An Efficient Graph Convolutional Network Technique for the Travelling Salesman Problem

Chaitanya K. Joshi, Thomas Laurent, Xavier Bresson

This paper introduces a new learning-based approach for approximately solving the Travelling Salesman Problem on 2D Euclidean graphs. We use deep Graph Convolutional Networks to build efficient TSP graph representations and output tours in a non-autoregressive manner via highly parallelized beam search. Our approach outperforms all recently proposed autoregressive deep learning techniques in terms of solution quality, inference speed and sample efficiency for problem instances of fixed graph sizes. In particular, we reduce the average optimality gap from 0.52% to 0.01% for 50 nodes, and from 2.26% to 1.39% for 100 nodes. Finally, despite improving upon other learning-based approaches for TSP, our approach falls short of standard Operations Research solvers.

EMApr 27, 2019
Working women and caste in India: A study of social disadvantage using feature attribution

Kuhu Joshi, Chaitanya K. Joshi

Women belonging to the socially disadvantaged caste-groups in India have historically been engaged in labour-intensive, blue-collar work. We study whether there has been any change in the ability to predict a woman's work-status and work-type based on her caste by interpreting machine learning models using feature attribution. We find that caste is now a less important determinant of work for the younger generation of women compared to the older generation. Moreover, younger women from disadvantaged castes are now more likely to be working in white-collar jobs.

CLJun 22, 2017
Personalization in Goal-Oriented Dialog

Chaitanya K. Joshi, Fei Mi, Boi Faltings

The main goal of modeling human conversation is to create agents which can interact with people in both open-ended and goal-oriented scenarios. End-to-end trained neural dialog systems are an important line of research for such generalized dialog models as they do not resort to any situation-specific handcrafting of rules. However, incorporating personalization into such systems is a largely unexplored topic as there are no existing corpora to facilitate such work. In this paper, we present a new dataset of goal-oriented dialogs which are influenced by speaker profiles attached to them. We analyze the shortcomings of an existing end-to-end dialog system based on Memory Networks and propose modifications to the architecture which enable personalization. We also investigate personalization in dialog as a multi-task learning problem, and show that a single model which shares features among various profiles outperforms separate models for each profile.