Natural Language Processing Methods for the Study of Protein-Ligand Interactions
It addresses the problem of predicting protein-ligand interactions for drug discovery and protein engineering, but it is incremental as it reviews existing methods without introducing new results.
This review explores how Natural Language Processing (NLP) methods are applied to predict protein-ligand interactions, leveraging parallels between human languages and biochemical data to advance drug discovery and protein engineering.
Recent advances in Natural Language Processing (NLP) have ignited interest in developing effective methods for predicting protein-ligand interactions (PLIs) given their relevance to drug discovery and protein engineering efforts and the ever-growing volume of biochemical sequence and structural data available. The parallels between human languages and the "languages" used to represent proteins and ligands have enabled the use of NLP machine learning approaches to advance PLI studies. In this review, we explain where and how such approaches have been applied in the recent literature and discuss useful mechanisms such as long short-term memory, transformers, and attention. We conclude with a discussion of the current limitations of NLP methods for the study of PLIs as well as key challenges that need to be addressed in future work.