LGAug 12, 2024Code
Open-Source Molecular Processing Pipeline for Generating MoleculesV Shreyas, Jose Siguenza, Karan Bania et al. · cmu
Generative models for molecules have shown considerable promise for use in computational chemistry, but remain difficult to use for non-experts. For this reason, we introduce open-source infrastructure for easily building generative molecular models into the widely used DeepChem [Ramsundar et al., 2019] library with the aim of creating a robust and reusable molecular generation pipeline. In particular, we add high quality PyTorch [Paszke et al., 2019] implementations of the Molecular Generative Adversarial Networks (MolGAN) [Cao and Kipf, 2022] and Normalizing Flows [Papamakarios et al., 2021]. Our implementations show strong performance comparable with past work [Kuznetsov and Polykovskiy, 2021, Cao and Kipf, 2022].
QMNov 18, 2024Code
A Modular Open Source Framework for Genomic Variant CallingAnkita Vaishnobi Bisoi, Shreyas V, Jose Siguenza et al.
Variant calling is a fundamental task in genomic research, essential for detecting genetic variations such as single nucleotide polymorphisms (SNPs) and insertions or deletions (indels). This paper presents an enhancement to DeepChem, a widely used open-source drug discovery framework, through the integration of DeepVariant. In particular, we introduce a variant calling pipeline that leverages DeepVariant's convolutional neural network (CNN) architecture to improve the accuracy and reliability of variant detection. The implemented pipeline includes stages for realignment of sequencing reads, candidate variant detection, and pileup image generation, followed by variant classification using a modified Inception v3 model. Our work adds a modular and extensible variant calling framework to the DeepChem framework and enables future work integrating DeepChem's drug discovery infrastructure more tightly with bioinformatics pipelines.
LGFeb 20Code
DeepmechanicsAbhay Shinde, Aryan Amit Barsainyan, Jose Siguenza et al.
Physics-informed deep learning models have emerged as powerful tools for learning dynamical systems. These models directly encode physical principles into network architectures. However, systematic benchmarking of these approaches across diverse physical phenomena remains limited, particularly in conservative and dissipative systems. In addition, benchmarking that has been done thus far does not integrate out full trajectories to check stability. In this work, we benchmark three prominent physics-informed architectures such as Hamiltonian Neural Networks (HNN), Lagrangian Neural Networks (LNN), and Symplectic Recurrent Neural Networks (SRNN) using the DeepChem framework, an open-source scientific machine learning library. We evaluate these models on six dynamical systems spanning classical conservative mechanics (mass-spring system, simple pendulum, double pendulum, and three-body problem, spring-pendulum) and non-conservative systems with contact (bouncing ball). We evaluate models by computing error on predicted trajectories and evaluate error both quantitatively and qualitatively. We find that all benchmarked models struggle to maintain stability for chaotic or nonconservative systems. Our results suggest that more research is needed for physics-informed deep learning models to learn robust models of classical mechanical systems.
LGOct 19, 2025
DeepChem Equivariant: SE(3)-Equivariant Support in an Open-Source Molecular Machine Learning LibraryJose Siguenza, Bharath Ramsundar
Neural networks that incorporate geometric relationships respecting SE(3) group transformations (e.g. rotations and translations) are increasingly important in molecular applications, such as molecular property prediction, protein structure modeling, and materials design. These models, known as SE(3)-equivariant neural networks, ensure outputs transform predictably with input coordinate changes by explicitly encoding spatial atomic positions. Although libraries such as E3NN [4] and SE(3)-TRANSFORMER [3 ] offer powerful implementations, they often require substantial deep learning or mathematical prior knowledge and lack complete training pipelines. We extend DEEPCHEM [ 13] with support for ready-to-use equivariant models, enabling scientists with minimal deep learning background to build, train, and evaluate models, such as SE(3)-Transformer and Tensor Field Networks. Our implementation includes equivariant models, complete training pipelines, and a toolkit of equivariant utilities, supported with comprehensive tests and documentation, to facilitate both application and further development of SE(3)-equivariant models.