Language modelling techniques for analysing the impact of human genetic variation
This work synthesizes existing research on computational variant effect prediction for biomedical researchers and practitioners, but it is incremental as it is a review paper rather than presenting new methods or results.
This review paper examines how language modeling techniques have been applied over the past decade to predict the effects of human genetic variants, which is crucial for disease risk analysis and personalized medicine, highlighting the significant advancements brought by Transformer models and their subsequent extensions.
Interpreting the effects of variants within the human genome and proteome is essential for analysing disease risk, predicting medication response, and developing personalised health interventions. Due to the intrinsic similarities between the structure of natural languages and genetic sequences, natural language processing techniques have demonstrated great applicability in computational variant effect prediction. In particular, the advent of the Transformer has led to significant advancements in the field. However, Transformer-based models are not without their limitations, and a number of extensions and alternatives have been developed to improve results and enhance computational efficiency. This review explores the use of language models for computational variant effect prediction over the past decade, analysing the main architectures, and identifying key trends and future directions.