Attention-based Multi-task Learning for Base Editor Outcome Prediction
This work addresses the need for faster and more efficient base editing design in genetic disease research, though it is incremental as it builds on existing machine learning approaches.
The paper tackled the problem of low efficiency and unintended mutations in base editing for genome editing by developing an attention-based two-stage machine learning model with multi-task learning to predict editing outcomes, achieving strong correlation with experimental results across multiple datasets and base editor variants.
Human genetic diseases often arise from point mutations, emphasizing the critical need for precise genome editing techniques. Among these, base editing stands out as it allows targeted alterations at the single nucleotide level. However, its clinical application is hindered by low editing efficiency and unintended mutations, necessitating extensive trial-and-error experimentation in the laboratory. To speed up this process, we present an attention-based two-stage machine learning model that learns to predict the likelihood of all possible editing outcomes for a given genomic target sequence. We further propose a multi-task learning schema to jointly learn multiple base editors (i.e. variants) at once. Our model's predictions consistently demonstrated a strong correlation with the actual experimental results on multiple datasets and base editor variants. These results provide further validation for the models' capacity to enhance and accelerate the process of refining base editing designs.