LGSep 10, 2022
Structured Q-learning For Antibody DesignAlexander I. Cowen-Rivers, Philip John Gorinski, Aivar Sootla et al.
Optimizing combinatorial structures is core to many real-world problems, such as those encountered in life sciences. For example, one of the crucial steps involved in antibody design is to find an arrangement of amino acids in a protein sequence that improves its binding with a pathogen. Combinatorial optimization of antibodies is difficult due to extremely large search spaces and non-linear objectives. Even for modest antibody design problems, where proteins have a sequence length of eleven, we are faced with searching over 2.05 x 10^14 structures. Applying traditional Reinforcement Learning algorithms such as Q-learning to combinatorial optimization results in poor performance. We propose Structured Q-learning (SQL), an extension of Q-learning that incorporates structural priors for combinatorial optimization. Using a molecular docking simulator, we demonstrate that SQL finds high binding energy sequences and performs favourably against baselines on eight challenging antibody design tasks, including designing antibodies for SARS-COV.
LGAug 8, 2022
Adversarial robustness of VAEs through the lens of local geometryAsif Khan, Amos Storkey
In an unsupervised attack on variational autoencoders (VAEs), an adversary finds a small perturbation in an input sample that significantly changes its latent space encoding, thereby compromising the reconstruction for a fixed decoder. A known reason for such vulnerability is the distortions in the latent space resulting from a mismatch between approximated latent posterior and a prior distribution. Consequently, a slight change in an input sample can move its encoding to a low/zero density region in the latent space resulting in an unconstrained generation. This paper demonstrates that an optimal way for an adversary to attack VAEs is to exploit a directional bias of a stochastic pullback metric tensor induced by the encoder and decoder networks. The pullback metric tensor of an encoder measures the change in infinitesimal latent volume from an input to a latent space. Thus, it can be viewed as a lens to analyse the effect of input perturbations leading to latent space distortions. We propose robustness evaluation scores using the eigenspectrum of a pullback metric tensor. Moreover, we empirically show that the scores correlate with the robustness parameter $β$ of the $β-$VAE. Since increasing $β$ also degrades reconstruction quality, we demonstrate a simple alternative using \textit{mixup} training to fill the empty regions in the latent space, thus improving robustness with improved reconstruction.
LGAug 19, 2023
Contrastive Learning for Non-Local Graphs with Multi-Resolution Structural ViewsAsif Khan, Amos Storkey
Learning node-level representations of heterophilic graphs is crucial for various applications, including fraudster detection and protein function prediction. In such graphs, nodes share structural similarity identified by the equivalence of their connectivity which is implicitly encoded in the form of higher-order hierarchical information in the graphs. The contrastive methods are popular choices for learning the representation of nodes in a graph. However, existing contrastive methods struggle to capture higher-order graph structures. To address this limitation, we propose a novel multiview contrastive learning approach that integrates diffusion filters on graphs. By incorporating multiple graph views as augmentations, our method captures the structural equivalence in heterophilic graphs, enabling the discovery of hidden relationships and similarities not apparent in traditional node representations. Our approach outperforms baselines on synthetic and real structural datasets, surpassing the best baseline by $16.06\%$ on Cornell, $3.27\%$ on Texas, and $8.04\%$ on Wisconsin. Additionally, it consistently achieves superior performance on proximal tasks, demonstrating its effectiveness in uncovering structural information and improving downstream applications.
LGApr 13
Physics-informed AI Accelerated Retention Analysis of Ferroelectric Vertical NAND: From Day-Scale TCAD to Second-Scale Surrogate ModelGyujun Jeong, Sungwon Cho, Minji Shon et al.
Ferroelectric field-effect transistors (FeFET)-based vertical NAND (Fe-VNAND) has emerged as a promising candidate to overcome z-scaling limitations with lower programming voltages. However, the data retention of 3D Fe-VNAND is hindered by the complex interaction between charge detrapping and ferroelectric depolarization. Developing optimized device designs requires exploring an extensive parameter space, but the high computational cost of conventional Technology Computer-Aided Design (TCAD) tools makes such wide-scale optimization impractical. To overcome these simulation barriers, we present a Physics-Informed Neural Operator (PINO)-based AI surrogate model designed for high-efficiency prediction of threshold voltage (Vth) shifts and retention behavior. By embedding fundamental physical principles into the learning architecture, our PINO framework achieves a speedup exceeding 10000x compared to TCAD while maintaining physical accuracy. This study demonstrates the model's effectiveness on a single FeFET configuration, serving as a pathway toward modeling the retention loss mechanisms.
LOApr 24
Dynamic Planar Graph Isomorphism is in DynFOSamir Datta, Asif Khan, Felix Tschirbs et al.
Consider two planar graphs which are subject to edge insertions and deletions. We show that whether the two graphs are isomorphic can be maintained with first-order logic formulas and auxiliary data of polynomial size. This places the dynamic planar graph isomorphism problem into the dynamic descriptive complexity class DynFO. As a consequence, there is a dynamic constant-time parallel algorithm with polynomial-size auxiliary data which maintains whether two dynamic planar graphs are isomorphic.
HCJan 19, 2022Code
Real-Time Gaze Tracking with Event-Driven Eye SegmentationYu Feng, Nathan Goulding-Hotta, Asif Khan et al.
Gaze tracking is increasingly becoming an essential component in Augmented and Virtual Reality. Modern gaze tracking al gorithms are heavyweight; they operate at most 5 Hz on mobile processors despite that near-eye cameras comfortably operate at a r eal-time rate ($>$ 30 Hz). This paper presents a real-time eye tracking algorithm that, on average, operates at 30 Hz on a mobile processor, achieves \ang{0.1}--\ang{0.5} gaze accuracies, all the while requiring only 30K parameters, one to two orders of magn itude smaller than state-of-the-art eye tracking algorithms. The crux of our algorithm is an Auto~ROI mode, which continuously pr edicts the Regions of Interest (ROIs) of near-eye images and judiciously processes only the ROIs for gaze estimation. To that end, we introduce a novel, lightweight ROI prediction algorithm by emulating an event camera. We discuss how a software emulation of events enables accurate ROI prediction without requiring special hardware. The code of our paper is available at https://github.com/horizon-research/edgaze.
OTFeb 21, 2025
Strategic priorities for transformative progress in advancing biology with proteomics and artificial intelligenceYingying Sun, Jun A, Zhiwei Liu et al.
Artificial intelligence (AI) is transforming scientific research, including proteomics. Advances in mass spectrometry (MS)-based proteomics data quality, diversity, and scale, combined with groundbreaking AI techniques, are unlocking new challenges and opportunities in biological discovery. Here, we highlight key areas where AI is driving innovation, from data analysis to new biological insights. These include developing an AI-friendly ecosystem for proteomics data generation, sharing, and analysis; improving peptide and protein identification and quantification; characterizing protein-protein interactions and protein complexes; advancing spatial and perturbation proteomics; integrating multi-omics data; and ultimately enabling AI-empowered virtual cells.
LGMar 22, 2025
Neural Network Operator-Based Fractal Approximation: Smoothness Preservation and Convergence AnalysisAaqib Ayoub Bhat, Asif Khan, M. Mursaleen
This paper presents a new approach of constructing $α$-fractal interpolation functions (FIFs) using neural network operators, integrating concepts from approximation theory. Initially, we construct $α$-fractals utilizing neural network-based operators, providing an approach to generating fractal functions with interpolation properties. Based on the same foundation, we have developed fractal interpolation functions that utilize only the values of the original function at the nodes or partition points, unlike traditional methods that rely on the entire original function. Further, we have constructed \(α\)-fractals that preserve the smoothness of functions under certain constraints by employing a four-layered neural network operator, ensuring that if \(f \in C^{r}[a,b]\), then the corresponding fractal \(f^α \in C^{r}[a,b]\). Furthermore, we analyze the convergence of these $α$-fractals to the original function under suitable conditions. The work uses key approximation theory tools, such as the modulus of continuity and interpolation operators, to develop convergence results and uniform approximation error bounds.
BMJan 29, 2022
AntBO: Towards Real-World Automated Antibody Design with Combinatorial Bayesian OptimisationAsif Khan, Alexander I. Cowen-Rivers, Antoine Grosnit et al.
Antibodies are canonically Y-shaped multimeric proteins capable of highly specific molecular recognition. The CDRH3 region located at the tip of variable chains of an antibody dominates antigen-binding specificity. Therefore, it is a priority to design optimal antigen-specific CDRH3 regions to develop therapeutic antibodies. However, the combinatorial nature of CDRH3 sequence space makes it impossible to search for an optimal binding sequence exhaustively and efficiently using computational approaches. Here, we present \texttt{AntBO}: a combinatorial Bayesian optimisation framework enabling efficient \textit{in silico} design of the CDRH3 region. Ideally, antibodies are expected to have high target specificity and developability. We introduce a CDRH3 trust region that restricts the search to sequences with favourable developability scores to achieve this goal. For benchmarking, \texttt{AntBO} uses the \texttt{Absolut!} software suite as a black-box oracle to score the target specificity and affinity of designed antibodies \textit{in silico} in an unconstrained fashion~\citep{robert2021one}. The experiments performed for $159$ discretised antigens used in \texttt{Absolut!} demonstrate the benefit of \texttt{AntBO} in designing CDRH3 regions with diverse biophysical properties. In under $200$ calls to black-box oracle, \texttt{AntBO} can suggest antibody sequences that outperform the best binding sequence drawn from 6.9 million experimentally obtained CDRH3s and a commonly used genetic algorithm baseline. Additionally, \texttt{AntBO} finds very-high affinity CDRH3 sequences in only 38 protein designs whilst requiring no domain knowledge. We conclude \texttt{AntBO} brings automated antibody design methods closer to what is practically viable for in vitro experimentation.
CVDec 2, 2021
Hamiltonian latent operators for content and motion disentanglement in image sequencesAsif Khan, Amos Storkey
We introduce \textit{HALO} -- a deep generative model utilising HAmiltonian Latent Operators to reliably disentangle content and motion information in image sequences. The \textit{content} represents summary statistics of a sequence, and \textit{motion} is a dynamic process that determines how information is expressed in any part of the sequence. By modelling the dynamics as a Hamiltonian motion, important desiderata are ensured: (1) the motion is reversible, (2) the symplectic, volume-preserving structure in phase space means paths are continuous and are not divergent in the latent space. Consequently, the nearness of sequence frames is realised by the nearness of their coordinates in the phase space, which proves valuable for disentanglement and long-term sequence generation. The sequence space is generally comprised of different types of dynamical motions. To ensure long-term separability and allow controlled generation, we associate every motion with a unique Hamiltonian that acts in its respective subspace. We demonstrate the utility of \textit{HALO} by swapping the motion of a pair of sequences, controlled generation, and image rotations.