Deciphering antibody affinity maturation with language models and weakly supervised learning
This work provides insights into immune repertoires for potential therapeutic antibody discovery, but it is incremental as it builds on existing language model approaches applied to a specific domain.
The researchers tackled the problem of understanding antibody affinity maturation by developing AntiBERTy, a language model trained on 558M natural antibody sequences, which clusters antibodies into trajectories resembling affinity maturation and identifies key binding residues using weakly supervised learning.
In response to pathogens, the adaptive immune system generates specific antibodies that bind and neutralize foreign antigens. Understanding the composition of an individual's immune repertoire can provide insights into this process and reveal potential therapeutic antibodies. In this work, we explore the application of antibody-specific language models to aid understanding of immune repertoires. We introduce AntiBERTy, a language model trained on 558M natural antibody sequences. We find that within repertoires, our model clusters antibodies into trajectories resembling affinity maturation. Importantly, we show that models trained to predict highly redundant sequences under a multiple instance learning framework identify key binding residues in the process. With further development, the methods presented here will provide new insights into antigen binding from repertoire sequences alone.