Recent advances in interpretable machine learning using structure-based protein representations
This is an incremental survey that benefits researchers in structural biology, drug development, and protein design by summarizing existing interpretable ML approaches.
The paper surveys methods for representing protein 3D structures and applying interpretable machine learning to tasks like structure prediction, function prediction, and protein-protein interactions, aiming to enhance interpretability and knowledge discovery in structural biology.
Recent advancements in machine learning (ML) are transforming the field of structural biology. For example, AlphaFold, a groundbreaking neural network for protein structure prediction, has been widely adopted by researchers. The availability of easy-to-use interfaces and interpretable outcomes from the neural network architecture, such as the confidence scores used to color the predicted structures, have made AlphaFold accessible even to non-ML experts. In this paper, we present various methods for representing protein 3D structures from low- to high-resolution, and show how interpretable ML methods can support tasks such as predicting protein structures, protein function, and protein-protein interactions. This survey also emphasizes the significance of interpreting and visualizing ML-based inference for structure-based protein representations that enhance interpretability and knowledge discovery. Developing such interpretable approaches promises to further accelerate fields including drug development and protein design.