Jakub Otwinowski

h-index12

3papers

32citations

Novelty60%

AI Score30

Ranked #137,384 of 194,257 authors (top 71%)#12 in PE (top 44%)

3 Papers

5.1BIO-PHNov 5, 2022

Learning the shape of protein micro-environments with a holographic convolutional neural network

Michael N. Pun, Andrew Ivanov, Quinn Bellamy et al.

Proteins play a central role in biology from immune recognition to brain activity. While major advances in machine learning have improved our ability to predict protein structure from sequence, determining protein function from structure remains a major challenge. Here, we introduce Holographic Convolutional Neural Network (H-CNN) for proteins, which is a physically motivated machine learning approach to model amino acid preferences in protein structures. H-CNN reflects physical interactions in a protein structure and recapitulates the functional information stored in evolutionary data. H-CNN accurately predicts the impact of mutations on protein function, including stability and binding of protein complexes. Our interpretable computational model for protein structure-function maps could guide design of novel proteins with desired function.

2.3PEMay 4, 2023

Contrastive losses as generalized models of global epistasis

David H. Brookes, Jakub Otwinowski, Sam Sinai

Fitness functions map large combinatorial spaces of biological sequences to properties of interest. Inferring these multimodal functions from experimental data is a central task in modern protein engineering. Global epistasis models are an effective and physically-grounded class of models for estimating fitness functions from observed data. These models assume that a sparse latent function is transformed by a monotonic nonlinearity to emit measurable fitness. Here we demonstrate that minimizing supervised contrastive loss functions, such as the Bradley-Terry loss, is a simple and flexible technique for extracting the sparse latent function implied by global epistasis. We argue by way of a fitness-epistasis uncertainty principle that the nonlinearities in global epistasis models can produce observed fitness functions that do not admit sparse representations, and thus may be inefficient to learn from observations when using a Mean Squared Error (MSE) loss (a common practice). We show that contrastive losses are able to accurately estimate a ranking function from limited data even in regimes where MSE is ineffective and validate the practical utility of this insight by demonstrating that contrastive loss functions result in consistently improved performance on benchmark tasks.

5.1PEDec 6, 2019Code

Information-geometric optimization with natural selection

Jakub Otwinowski, Colin LaMont

Evolutionary algorithms, inspired by natural evolution, aim to optimize difficult objective functions without computing derivatives. Here we detail the relationship between population genetics and evolutionary optimization and formulate a new evolutionary algorithm. Optimization of a continuous objective function is analogous to searching for high fitness phenotypes on a fitness landscape. We summarize how natural selection moves a population along the non-euclidean gradient that is induced by the population on the fitness landscape (the natural gradient). Under normal approximations common in quantitative genetics, we show how selection is related to Newton's method in optimization. We find that intermediate selection is most informative of the fitness landscape. We describe the generation of new phenotypes and introduce an operator that recombines the whole population to generate variants that preserve normal statistics. Finally, we introduce a proof-of-principle algorithm that combines natural selection, our recombination operator, and an adaptive method to increase selection. Our algorithm is similar to covariance matrix adaptation and natural evolutionary strategies in optimization, and has similar performance. The algorithm is extremely simple in implementation with no matrix inversion or factorization, does not require storing a covariance matrix, and may form the basis of more general model-based optimization algorithms with natural gradient updates.