![]() One of the biggest challenges is to design libraries diverse enough to target many antigens but also be well-expressed, stable, and non-poly-reactive. Previous statistical and structural modeling of antibody repertoires 11, 12, 13, 14, 15, 16, 17, 18 have addressed the characterization of sequences of natural antibodies or predicted higher affinity sequences from immunization or selection experiments. The increasing demand for and success with the rapid and efficient discovery of novel nanobodies using phage and yeast display methods 7, 8, 9, 10 have spurred interest in the design of optimal starting libraries. Single-domain antibodies, or nanobodies, are composed solely of the variable domain of the canonical antibody heavy chain. Antibodies are valuable tools for molecular biology and therapeutics because they can detect low concentrations of target antigens with high sensitivity and specificity 6. This will enable the design of libraries for tractable high-throughput experiments that are optimized for functional sequences and variants that are distant in sequence.Īntibody design is a particularly challenging problem in the area of statistical modeling of sequences for the purposes of prediction and design. Therefore, the open challenge is to develop computational methods that can accelerate this search and bias the search space for protein sequences that are likely to be functional. ![]() As the vast majority of possible sequences will be non-functional proteins, it is crucial to minimize or eliminate these sequences from libraries. However, since the space of possible protein sequences is so large (for a protein of length 100 this is 10 130), deep mutational scans 5 and even very large libraries (e.g., >10 10 variants) barely scratch the surface of the possibilities. Designing and generating biomolecules with known functions is now a major goal of biotechnology and biomedicine, propelled by our ability to synthesize and sequence DNA at increasingly low costs. ![]() Over the past 20 years, success in protein engineering has emerged from two distinct approaches, directed evolution 1, 2 and knowledge-based force-field modeling 3, 4. Our results demonstrate the power of the alignment-free autoregressive model in generalizing to regions of sequence space traditionally considered beyond the reach of prediction and design. ![]() The model performs state-of-art prediction of missense and indel effects and we successfully design and test a diverse 10 5-nanobody library that shows better expression than a 1000-fold larger synthetic library. We introduce a deep generative model adapted from natural language processing for prediction and design of diverse functional sequences without the need for alignments. Such applications include the prediction of variant effects of indels, disordered proteins, and the design of proteins such as antibodies due to the highly variable complementarity determining regions. State-of-art computational methods rely on models that leverage evolutionary information but are inadequate for important applications where multiple sequence alignments are not robust. The ability to design functional sequences and predict effects of variation is central to protein engineering and biotherapeutics.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |