Practical Approaches to Genetic Code Expansion with Aminoacyl-tRNA Synthetase/tRNA Pairs

: Genetic code expansion (GCE) can enable the site-selective incorporation of non-canonical amino acids (ncAAs) into proteins. GCE has advanced tremendously in the last decade and can be used to create biorthogonal handles, monitor and control proteins inside cells, study post-translational modifications, and engineer new protein functions. Since establishing our laboratory, our research has focused on applications of GCE in protein and enzyme engineering using aminoacyl-tRNA synthetase/tRNA (aaRS/tRNA) pairs. This topic has been reviewed extensively, leaving little doubt that GCE is a powerful tool for engineering proteins and enzymes. Therefore, for this young faculty issue, we wanted to provide a more technical look into the methods we use and the challenges we think about in our laboratory. Since starting the laboratory, we have successfully engineered over a dozen novel aaRS/tRNA pairs tailored for various GCE applications. However, we acknowledge that the field can pose challenges even for experts. Thus, herein, we provide a review of methodologies in ncAA incorporation with some practical commentary and a focus on challenges, emerging solutions, and exciting developments.


Introduction
Nature uses a genetic code that is highly redundant, employing 64 codons to encode for only 20 proteinogenic amino acids (prAAs) and a translation termination signal.For decades, this redundancy has been leveraged to reassign codons to encode for non-canonical amino acids (ncAAs).Genetic code expansion (GCE) with ncAAs has enabled the generation of proteins and enzymes with new functionalities, and such research has facilitated discoveries in multiple disciplines, particularly chemistry, biology, and synthetic biology.[31][32][33][34][35][36] An illustration of these different applications is provided in Fig. 1A.The ability to install biorthogonal handles, fluorophores, PTMs, and cleavable protecting groups has enabled the rich study of protein function in whole cells and the biochemical analysis of proteins.In contrast, the ability to engineer proteins and enzymes for native or non-native function with ncAAs is still a developing methodology but has the potential to transform direct evolution, biocatalysis, and synthetic biology.
The most widely employed method for GCE uses engineered aminoacyl-tRNA synthetase (aaRS) and tRNA pairs to selectively incorporate ncAAs.Methods for aaRS/tRNA engineering have been described and refined over more than 30 years.However, the implementation of such methods is often the source of discussion and debate, even among experts.Additionally, the application of engineered aaRS/tRNA pairs can sometimes pose challenges that have spawned sub-fields within GCE.In this review, we will discuss the process of engineering aaRS/tRNA pairs, the practical application of engineered aaRS/tRNA pairs, and exciting recent advances.Our goal is to provide a snapshot of methods and considerations in the workflow of aaRS/tRNA-mediated GCE.9] 2. Aminoacyl-tRNA Synthetase/tRNA Pair Identification For ncAA incorporation with aaRS/tRNA pairs, the ncAA is co-translationally incorporated into a nascent polypeptide chain in response to a recoded codon on the mRNA (typically the amber stop codon UAG) into a protein at a site of interest (Fig. 1B). [40]-aaRS/tRNA pairs because these pairs are rarely evolved to discriminate against other ncAAs.This feature can be highly advantageous; however, repurposing of promiscuous pairs often does not lead to the most efficient incorporation.Thus, typically, if a novel ncAA is to be incorporated, a suitable o-aaRS/tRNA pair needs to be identified by o-aaRS/tRNA engineering.

Aminoacyl-tRNA Synthetase/tRNA Engineering
The starting point for o-aaRS/tRNA engineering depends firstly on the organism of application.The first choice is typically the PylRS/PylT CUA pair, as it can be conveniently engineered in E. coli, and then applied in both E. coli and eukaryotic cells.Alternatively, if only E. coli is the target organism and the ncAA is a tyrosine derivative, the TyrS/TyrT CUA pair from M. jannaschii can also be chosen.
The classical engineering workflow to reprogram the aminoacylation specificity towards a new ncAA typically entails the steps outlined in Fig. 2. First, a library of aaRS variants is prepared.Here, the method of choice is multi-site saturation mutagenesis (SSM) at 3-6 residues of the substrate binding pocket.Commonly used positions are L305/L309/N346/C348/W417 for the PylRS from M. mazei and Y32/E107/D158/I159/L162 for TyrRS from M. jannaschii.Crystal structures of both enzymes are available (2ZCE [56] for PylS and 1J1U [57] for TyrRS), also in complex with different ncAA ligands (e.g.4CH6 [58] for PylS and 1U7X [59] for M. jannaschii).In our experience, the generation of a few libraries covering 3-5 residues in different combinations has yielded satisfactory results on a range of ncAAs.Large (>10 7 variants) SSM libraries can be routinely generated in vitro using enzymatic inverse PCR or in vivo using multiplex automated genome engineering by lambda red recombineering (MAGE). [60,61]s an alternative to SSM, random mutagenesis can be carried out throughout the o-aaRS, using e.g.error-prone polymerase chain reaction (epPCR) with Taq polymerase, manganese, and unbalanced dNTP ratios. [62]Even though very large libraries (<10 10 variants) can be generated by this approach, we typically observed lower success rates for the de novo discovery of o-aaRS specific for a new ncAA.Thus, we often opt to use SSM libraries.No matter the type of library chosen, a sufficient coverage and quality of the o-aaRS library are essential.Thus, during library preparation and selection steps, colony forming units (CFUs) should be monitored to not artificially bottleneck the genetic diversity of the o-aaRS variants.During library creation and the initial transformation step, one should aim to obtain approximately 3-fold the number of CFUs compared to the possible DNA variants to yield 95% library coverage. [63,64]Notably, this estimate assumes an unbiased DNA library in which all variants are equally frequently represented.Often, DNA sequencing of individual clones or nextgeneration sequencing can help to assess library quality.
Once one or more libraries are obtained, a positive selection step in the presence of the ncAA is carried out using an antibiotic resistance gene containing an amber stop codon at a permissive position (typically chloramphenicol acetyltransferase carrying a D111 TAG mutation), allowing variants to survive that aminoacylate the o-tRNA with either the ncAA or one of the 20 prAAs. [65]It must be noted that the selection conditions chosen during the first positive step will decide the minimal o-aaRS activity level.Choosing conditions that are too stringent, i.e. too high antibiotic concentrations or too low ncAA concentrations, can eliminate desirable o-aaRS, leading to false negatives.Next, the cells surviving the positive selection are collected, the DNA is extracted, and a negative selection step in the absence of the ncAA is carried out using a toxic gene, such as barnase, [65] CcdB, [66] or TolC, [61] carrying one or more amber stop codons at permissive positions.Va riants that aminoacylate the o-tRNA with a prAA express the toxic gene and are killed.In our experience, negative selection using a barnase gene carrying two To ensure specificity and fidelity, the aaRS/tRNA pair for ncAA incorporation must be orthogonal to the host (indicated as o-aaRS/ tRNA).In a fully Orthogonal Translation System (OTS), the o-aaRS discriminates against the endogenous proteinogenic amino acids and the endogenous tRNAs of the host, and specifically aminoacylates the o-tRNA with the ncAA.The correct aminoacylation of tRNAs by their cognate aaRSs predominantly relies on interactions between the aaRS and a limited number of nucleotides of the tRNA called 'identity elements'. [41,42]Major identity elements are often, but not always, conserved between Bacteria, Archaea, and Eukarya.Several robust o-aaRS/tRNA pairs have been engineered by exploiting natural differences in the recognition of identity elements between organisms.As a result, the orthogonality of a given o-aaRS/tRNA pair is dependent on the organism in which it is used.

Aminoacyl-tRNA Synthetase/tRNA Pair Selection
In Escherichia coli (E.coli), several amber suppressor o-aaRS/tRNA pairs have been successfully adapted from archaeal and eukaryotic aaRS/tRNA pairs.These include the Pyrrolysyl o-aaRS/tRNA (PylRS/PylT CUA ) pair from methanogens like Methanosarcina barkeri [43,44] or Methanosarcina mazei [45] which have been used to incorporate >100 diverse ncAAs, the tyrosyl o-aaRS/tRNA from Methanocaldococcus jannaschii [40] and Archaeoglobus fulgidus [46] which have been used to incorporate >50 tyrosine analogs, a lysyl o-aaRS/tRNA from Pyrococcus horikoshii, [47] and the phenylalanyl [48] and tryptophanyl [49] o-aaRS/tRNA pairs from Saccharomyces cerevisiae (S. cerevisiae).In S. cerevisiae and mammalian cells, mainly the archaeal PylRS/PylT CUA pair and the bacterial tyrosyl [50] and leucyl o-aaRS/tRNA [18] pairs from E. coli are used.Although there are a plethora of o-aaRS/tRNA pairs available, practically, the choice of OTS is typically dictated by the availability of a published o-aaRS/tRNA pair for a given ncAA that works in the desired organism.Recently, an extensive database was created that catalogs applications of o-aaRS/tRNA pairs for ncAA incorporation, facilitating the search for suitable pairs. [51][54][55] Promiscuity is a common feature of engineered absence of the target ncAA.Subsequent deep sequencing and statistical analysis can be used to identify differentially enriched o-aaRS variants.The most promising candidates are then individually cloned and tested for efficiency and selectivity.This approach can overcome some limitations of negative selections and identify o-aaRS variants even when background incorporation cannot be avoided.Notably, this approach incurs high costs during deep sequencing.Shaofei Zhang et al. [16] successfully used parallel selections to genetically encode biosynthesized phosphothreonine in E. coli.Due to high phosphoserine background incorporation by the chosen o-aaRS system, the authors opted for parallel selection and successfully identified a variant that enabled quantitative phosphothreonine incorporation.In E. coli, phage-assisted continuous evolution (PACE) has been leveraged to greatly improve the activity and orthogonality of o-aaRS/tRNA (Fig. 3B). [67,68]During PACE, the propagation of the M13 bacteriophage is linked to the desired function of a biomolecule, leading to the expression of the gIII gene that encodes the pIII coat protein of M13 bacteriophage.In o-aaRS/ tRNA engineering, the propagation of the phage can conveniently be linked to the suppression of one or more amber stop codons in either a T7 RNA polymerase that drives the expression of pIII or pIII itself. [67]The o-aaRS is typically encoded on the selection phage (SP), while the other components are encoded on accessory and complementary plasmids (AP and CP).Genetic diversity is introduced using a range of powerful mutagenesis plasmids (MP). [69]Nonetheless, current PACE implementations are not optimally suited for the de novo selection of o-aaRS/ tRNA pairs, as multiple amino acid changes may be necessary in a single step.It is conceivable that this could be mitigated by seeding PACE selections with o-aaRS libraries generated from SSM, in which PACE enriches functional o-aaRS mutations from the SSM diversity and explores an additional layer of mutations through continuous evolution.
Instead of relying on live/death selections, Fluorescence-Activated Cell Sorting (FACS) approaches leveraging a GFP reporter which have been implemented in both E. coli and S.
amber mutations (at e.g.Gln3 TAG and Asp45 TAG ) works robustly in eliminating all but very weakly cross-reactive variants.Rounds of positive and negative selection are completed iteratively to reduce the frequency of false positives.Finally, a positive screening step in the presence of the ncAA is carried out, and individual colonies are subcultured and screened in the absence and presence of the ncAA using the green fluorescent protein (GFP) containing an internal amber stop codon.Clones with a high GFP expression in the presence compared to the absence of the ncAA should then be screened by liquid chromatography-mass spectrometry (LC-MS) to identify cells expressing an o-aaRS/tRNA pair that is efficient and selective for the desired ncAA.
For many research groups, the classical workflow remains the method of choice.It is quite robust, but laborious, as positive and negative selections are iteratively carried out, typically over 3-5 rounds.Additionally, the workflow has some limitations.First, it is often only practical to test a few selection conditions in parallel, but many variables can affect the selection outcome, including antibiotic concentration, toxic gene expression level, and ncAA concentrations.When the manufacture of the ncAA is expensive, even testing two or three conditions over iterative rounds of selection can be burdensome.Thus, potential hits may remain elusive owing to bias in the limited conditions tested.Additionally, when using negative selections, o-aaRS hits with a low level of background incorporation of a prAA can be lost, even though the addition of sufficient ncAA could outcompete the background incorporation of the prAA.Omission of the negative selection step can thus lead to the identification of o-aaRS/tRNA pairs that have some level of background incorporation (typically tyrosine for TyrRS or phenylalanine for PylRS) but highly efficient activity in the presence of the ncAA.Due to these limitations, some research groups have developed alternative screening and selection workflows (Fig. 3A-D).
An alternative to sequential positive and negative selections is parallel positive and negative selections coupled to deep sequencing and statistical analysis (Fig. 3A). [16]In these assays, a positive selection -such as that in Fig. 2 -is performed in the presence and Finally, some bioinformatic approaches have also been implemented to incorporate new ncAAs (Fig. 3D).Using a dataset of 285 combinations of o-aaRS/substrate pairs, a recent report implemented a machine-learning model to expand the substrate scope of three PylRS variants. [71]Although this approach is still in its infancy, it could become increasingly feasible as more data on o-aaRS/substrate combinations are collected.
Thus, several routes are available for o-aaRS/tRNA pair engineering.The selected methodology for a given engineering campaign depends largely on the suitability of negative selection and access to instrumentation or techniques.Notably, the method of choice for most groups remains the classical engineering work-cerevisiae [70] (Fig. 3C).Starting from an initial library of o-aaRS variants, cell populations expressing high GFP in the presence of the ncAA and low GFP in the absence of the ncAA can be gated.o-aaRS variants are identified over several rounds of sorting and amplification.Lastly, individual clones are tested.FACS-based screening also has several limitations.First, the throughput is generally limited to 10 8 cells/h, limiting the library sizes that can be readily screened with complete coverage.Additionally, depending on the research facilities, FACS remains inaccessible to some researchers due to limitations in equipment or prohibitive costs, considering that for the screening of a single library, several sorts should be carried out in series.flow or small modifications thereof.However, this bias may primarily result from its relative ease of implementation and lack of reliance on specialized equipment or expensive methodologies.

Troubleshooting an Unsuccessful Engineering Campaign
Several factors that are unrelated to the chosen mutagenesis and screening/selection strategy can influence the outcome of an o-aaRS/tRNA engineering campaign.First, the expression system for the OTS might be suboptimal.An emerging link between o-aaRS thermostability and evolvability hints that a combination of destabilizing mutations can lead to a reduction of the expression level of the o-aaRS, leading to the loss of these variants during selection. [72]Conversely, strong overexpression of the OTS components can lead to high background incorporation. [73]Highly thermostable homologs of commonly used o-aaRS offer an attractive alternative and have been shown to accommodate more destabilizing mutations generated with the o-aaRS.
A limiting factor during ncAA incorporation can be the intracellular concentration of the ncAA, which may not reflect the ncAA concentrations that are added to the growth media. [16,74,75]ow intracellular concentrations can result from low cellular uptake, which can be the case for highly negatively charged or polar ncAAs or because the ncAA is metabolized in the cell.Intracellular ncAA concentrations can be determined for most ncAAs using LC-MS. [16]Derivatization of the ncAA can be applied to improve its uptake.Commonly used strategies include esterification of the carboxyl group and delivery of the ncAA as a dipeptide. [76]Alternatively, endogenous non-specific amino acid transporters can be overexpressed.If the ncAA cannot be accumulated sufficiently in the cell due to metabolism, endogenous enzymes that degrade the ncAA can be silenced or knocked out using genome editing techniques. [15]Likewise, if a suitable biosynthetic route is available, the biosynthesis machinery can be coexpressed in the host organism and exogenous addition of ncAAs can be avoided. [16,77,78]In rare cases, if the ncAA is highly similar to a prAA, host enzymes can be exploited to metabolize an ncAA precursor. [79]ven though an ncAA might be sufficiently abundant intracellularly and efficiently aminoacylated by an o-aaRS, efficient translation might still be hindered by factors relating to the translational machinery.First, an o-tRNA aminoacylated with an ncAA might have limited affinity to elongation factor thermo unstable (EF-Tu), the protein responsible for the transport of aminoacylated tRNAs into the decoding center of the ribosome. [80]In certain cases, this problem can be overcome by engineering EF-Tu. [81]A plasmid for the expression of engineered variants of EF-Tu can be added to cells to improve incorporation.
ncAA incorporation can also fail due to a lack of orthogonality of the ncAA towards the host translational machinery, as some ncAAs are natural substrates for endogenous aaRS. [82]ncorporation of the ncAA by a host aaRS onto an endogenous tRNA can lead to mistranslation at sense codons, leading to toxicity.If this is the case, in vivo incorporation can prove difficult.Proteins with site-specific incorporation could be obtained through an alternative approach, such as the incorporation of a photo-or chemically-protected version of the ncAA or the site-specific incorporation of a bio-orthogonal modification handle that can be post-translationally reacted to yield the side chain of choice. [83,84]inally, the ncAA might be strictly incompatible with the host translational machinery due to physical limitations, including the accommodation of the ncAA in the decoding center of the ribosome due to size or improper conformational positioning of the ncAA, preventing or impeding the nucleophilic attack of the aminoacyl-tRNA nucleophile attacking the peptidyl-tRNA carbonyl. [85]For polymer building blocks with exotic backbone structures, reduced ribosomal efficiency is often reported. [86,87][90][91] When a mutant ribosome is necessary, a plasmid expressing the mutant orthogonal ribosome or dedicated cell line can be used during expression.

Improvement of Weakly Active o-aaRS/tRNA Pairs
If only a weakly active or overly promiscuous o-aaRS/tRNA pair is obtained from initial library selections, application of the OTS to large-scale protein expression might be cumbersome due to low protein yields or background incorporation, respectively.A series of approaches can be taken to improve such an o-aaRS/ tRNA.Starting out from unoptimized, weak hits, o-aaRS/tRNA pairs can often be improved by additional engineering.
Random mutagenesis of the o-aaRS followed by positive selections can be carried out to identify mutations that lie both within and outside of the substrate binding pocket and improve the activity of the o-aaRS on a given ncAA.In a recent example, Zhao et al. improved the activity of a weakly active chPheRS incorporating 4-azido-L-phenylalanine over five rounds of epPCR, positive selection, and screening in microtiter plates to identify mutants with up to 6.2-fold improved suppression efficiency. [73]dditionally, random mutagenesis and selection can be carried out continuously using PACE.Over 24h, Bryson et al. improved the specificity of a polyspecific PylRS incorporating 4-amino-L-phenylalanine and 4-iodo-L-phenylalanine to predominantly incorporate 4-iodo-L-phenylalanine with a >23-fold specificity, highlighting the power of this approach. [67]owever, having to carry out additional engineering after each library selection is cumbersome and time-consuming.A series of transferrable mutations have been identified that are applicable to several o-aaRS across different ncAAs.These can be transferred to the identified hit and often increase suppression efficiencies.The commonly used PylRSs from M. mazei and M. barkeri are two-domain enzymes, containing a tRNA-binding N-terminal domain and a catalytic C-terminal domain. [92]Several engineering efforts have revealed that a series of mutations in the N-terminal domain (R19H/H29R/T122S in PylRS from M. mazei [93] or V31I/T56P/H62Y/A100E in PylRS from M. barkeri [67] ) are transferable to other PylRS mutants and can boost suppression efficiencies significantly, often by several fold.The mutations S158N [94] and D161N [95] in the linker and Y384F mutation in the catalytic C-terminal domain of PylRS from M. mazei have also been shown to increase suppression activities in some cases. [96]Finally, o-aaRS instability can be problematic.In the PylRS system, a P188G mutation has previously been reported to reduce in vivo proteolysis of the o-aaRS and to boost suppression efficiency. [97]In the TyrRS from M. jannaschii, the D286R mutation has been described to increase suppression efficiency by improving TyrT CUA recognition. [57]100] The pEvol, [98] pUltra, [99] and pTECH [67] OTS expression plasmids are available from Addgene with various o-aaRS/tRNA constructs and have been shown to deliver good suppression efficiencies.

Benchmarking Engineered o-aaRS/tRNA Pair
A critical final step in evolving o-aaRS/tRNA pairs is benchmarking the efficiency of ncAA incorporation.Unfortunately, there is no universal standard for benchmarking o-aaRS/tRNA pairs.Thus, it can be difficult to directly compare the relative efficiencies of o-aaRS/tRNA pairs from literature alone.Often, our most salient concern is understanding the extent to which protein yield may decrease when transitioning from wild-type (wt) protein to the expression of mutants containing ncAAs.Thus, in our signals.For eukaryotes, the biophysical features that give rise to a tetranucleotide bias have been characterized structurally. [114]In cryo-EM of the ribosomal complex, compaction of the mRNA is observed upon binding of the eukaryotic release factor (eRF1), which is responsible for termination at all eukaryotic stop codons.This compaction creates a stacking interaction between the nucleotide immediately following the stop codon and G626 (bacterial equivalent G530) of 18S rRNA.Such stacking interactions would be more favorable for purines, which correlates with the bioinformatics observations that purines (A or G) are more frequently observed immediately following a true stop codon in eukaryotes. [104]Based on this research, one might expect to find similar rules governing termination at artificial amber codons targeted for ncAA incorporation.
[121][122][123] The methodologies implemented in each study are usually very different.Thus, as one might expect, conflicting conclusions are reported even within the same cell lines.Some reports find methods for selecting an optimal 5' and 3' context.In contrast, others find no clear predictors of amber suppression efficiency.Notably, a comparison of sequence context for amber suppression with engineered TyrRS/tRNA and PylRS/ tRNA pairs demonstrates that there may be differences in bias for the two pairs, suggesting that the aminoacylated tRNA also plays a role in the observed context bias. [115]Such findings only serve to compound questions surrounding sequence context bias and also provide perspective for understanding the conflicting findings between methodologies that employ different ncAAs and tRNAs.
For ncAA incorporation in mammalian cells, a recent study found several determinants of sequence context bias in the two codons upstream and the two codons downstream of an amber codon.The authors developed an online model for designing silent mutations to improve amber suppression: iPASS. [119]The model was developed with PylRS/tRNA from M. mazei and validated with mammalian cells.Thus, the applicability to other o-aaRS/ tRNA pairs or bacteria remains unclear.Interestingly, in this study, the efficiency of amber suppression was lower for contexts in which the nucleotide immediately after the amber stop codon contained a purine.These results echo nicely the biophysical and genomics results for eukaryotes. [104,114]n contrast, for bacterial studies of ncAA incorporation in which the authors found predictors of bias, the reported trend for amber suppression efficiency of the tetranucleotide signal is as follows: UAGA > UAGG > UAGC > UAGU. [115,116,124]These results were observed by independent groups assessing both PylRS/ tRNA and TyrRS/tRNA pairs.Such results contrast with what one might expect based on genomic analyses of natural termination motifs in bacteria.Thus, it is possible that the mechanisms governing termination at natural amber codons are not the same as those governing internal amber suppression.
Despite a vested interest in its elucidation, a complete understanding of sequence context bias for ncAA incorporation remains elusive, particularly for bacteria.Thus, it is difficult to provide general advice on this subject.Efforts to circumvent sequence context bias can include selection of multiple homologs of the target protein, optimization of mRNA context based on current context bias observations, identification of a highly robust o-aaRS/ tRNA pair, engineering elongation factors, [125] implementation of alternative ribosomes, and the use of alternative cell lines.For work with eukaryotes, context optimization may provide facile solutions given the greater amount of understanding in this domain.In contrast, because sequence context bias in bacterial cells is less well defined at present, this route is often not successful.Thus, for ncAA incorporation in bacteria, using cell lines optimized for amber suppression currently provides the most facile alternative.laboratory, the preferred method of benchmarking an engineered o-aaRS/tRNA pair for amber suppression is to report the relative GFP yield as a function of cell density for i) wt GFP, ii) N150 TAG -GFP, iii) N150 TAG -GFP co-expressed with the PylRS/tRNA pair and the commercially available substrate analog N ε -Boc-L-lysine, and iv) N150 TAG -GFP co-expressed with the o-aaRS/tRNA pair and target ncAA.This information allows for more robust comparisons between different laboratories.Nonetheless, it does not address the incorporation efficiency for a specific protein target, which must be assessed on a case-by-case basis.

Application of an Engineered o-aaRS/tRNA Pair for Amber Suppression
Often, for end users of GCE, the application of evolved o-aaRS/tRNA pairs is the most relevant step.With an efficient o-aaRS/tRNA pair, genetic incorporation of an ncAA into a target protein can be considered a relatively straightforward task with potential pitfalls.In the presence of the ncAA, the target gene containing the amber codon is co-expressed with the ncAA and the o-aaRS/tRNA pair.The cell produces the desired protein, incorporating the ncAA in response to the amber codon.Despite the simplicity of the general experimental setup, there are several factors that can influence successful ncAA incorporation.Two of the most important factors are sequence context bias and the cell line used for protein expression.

Sequence Context Bias for Amber Codon Suppression
The efficiency of stop codon suppression varies not only with the efficiency of the o-aaRS/tRNA pair but also with the upstream and downstream sequence context of the mRNA and the nascent polypeptide. [101]In an extreme case, amber suppression at one site may result in similar protein yields to wt. protein expression, but amber suppression at another site yields no protein or only protein truncated at the amber codon.Although sequence context bias plays a role in ncAA incorporation, the factors underpinning this phenomenon remain challenging to elucidate fully.Instances of sequence context bias are likely under-reported in the literature because they are considered failed research.Moreover, because of the complexity of translation and protein folding, it can be difficult to separate effects related to the mRNA context and protein folding.The latter of which is often not considered to be sequence context bias, but rather a protein structural feature.However, a few publications have attempted to systematically uncover determinants of sequence context bias.Additionally, it is possible that more information can be gleaned through decades of research on the termination efficiency and sequence context of natural translation termination signals.
[111][112][113] Briefly, many studies suggest that sequence context bias exists in the nucleotides both upstream and downstream of a natural stop codon, but differs in prokaryotes and eukaryotes, and may be heterogeneous even among single cells in a population.One of the most frequently studied factors of such bias is the nucleotide immediately after a natural stop codon.This bias is so widespread that often it is said that the stop codon and the nucleotide immediately after it form a tetranucleotide termination signal.In a K12 E. coli strain, the termination efficiency of amber codon containing tetranucleotide termination signals follows the trend UAGG > UAGU ≈ UAGA > UAGC. [103]These observations were stronger for genes with higher expression, further indicating the potential importance of these features in efficient translation termination in response to natural termination

Cell Lines for Amber Codon Suppression
Cell lines derived from the K-12 E. coli MC1061, such as DH10Beta and Top10, have been instrumental in engineering o-aaRS/tRNA pairs.Most engineering campaigns rely exclusively on these cell lines.A notable exception is S. cerevisiae MaV203, which has also been used to evolve o-aaRS/tRNA pairs, albeit significantly less frequently. [50,70]Derivatives of E. coli MC1061 are auspicious tools for the creation, screening, and selection of o-aaRS/tRNA pairs because they have high transformation efficiencies and grow rapidly. [126]However, other cell lines are often preferred for the production of proteins that do not express well in bacteria, the study of proteins in their natural environment, or applications requiring high-yield protein expression.Thus, researchers often want to apply the evolved o-aaRS/tRNA pair in a different cell line.
Protein expression in yeast or mammalian cells is often desired when one wants to produce proteins that are not suitable for expression in mammalian cells or when one wants to study a protein in its native environment.One major advantage of PylRS/ tRNA is its orthogonality to the endogenous cellular machinery of E. coli, yeast, and mammalian cells.Thus, it is often possible to transfer PylRS/tRNA-derived pairs engineered in E. coli to yeast or mammalian cells.However, caution should always be taken when switching between cell lines because changes in efficiency and selectivity can occur. [127]][130] These effects can sometimes be mitigated by plasmid engineering or supplementation with very high concentrations of ncAA.
[133][134] The C321.∆A cell line and its derivatives are devoid of the amber stop codon and its cognate release factor (RF1).These cell lines were derived from the K-12 E. coli cell line MG1655, which has 321 TAG codons. [131]Interestingly, for BL21(DE3), a knockout of RF1 is sufficient to improve amber suppression dramatically without the need for genomic recoding of the endogenous amber stop codons. [133]However, the cell line grows more slowly than BL21(DE3).Recoding only 95 of the 273 TAG codons in BL21(DE3) gives rise to a cell line that grows efficiently and exhibits improved amber suppression (B-95.∆A∆fabR).All these cell lines reduce unwanted termination at the desired amber suppression site.Excitingly, these changes have enabled robust incorporation of an ncAA at multiple sites within a protein.Up to ten amber stop codons could be suppressed to produce a protein with ten p-acetyl-L-phenylalanine residues either distributed throughout the protein or clustered consecutively. [133]These advances offer exciting opportunities to study the interplay of multiple protein modifications in protein function, such as phosphorylation. [135,136]onetheless, amber suppression is limited to site-specific modifications with only a single ncAA.

Genetic Incorporation of Multiple, Distinct ncAAs
Excitingly, GCE has advanced tremendously in recent decades.][139][140][141] This feat can be achieved by combinations of stop codon suppression, [142] quadruplet codon suppression, [143] codons containing non-canonical bases, [144][145][146] and sense codon reassignment with cell lines operating with reduced genetic code. [147,148]In whole cells, the genetic incorporation of up to four, distinct ncAAs is now possible.Such methodologies have been used to study the interplay of post-translational modifications, [149] provide multi-labeling strategies, [139,150] and create cells that re-sist viral infection and gene transfer. [148,151]Such feats raise the question of what constrains the further expansion of genetic codes.

Challenges and Opportunities
Despite decades of research in identifying and engineering o-aaRS/tRNA pairs, only a few classes of pairs can be practically implemented.Admittedly, even with these limited numbers of OTS starting points, a surprisingly wide diversity of ncAAs can be genetically incorporated.However, such a small pool of classes may hinder the collective evolvability towards ncAAs of interest.To this point, there are some ncAA structures that remain challenging, including very small ncAAs.Thus, the identification of o-aaRS/tRNA pairs remains an important pursuit for expanding the chemistries available through GCE.Informatics approaches have already been successful in the discovery of novel o-aaRS/ tRNA pairs, [152] and computational methods that combine deeplearning and physics-based approaches hold tremendous promise for transforming the field further.
As described above, sequence context bias for incorporation of ncAAs can be a limitation for some applications and is not yet fully understood for amber suppression.Analogous to amber suppression, it is likely that sequence context bias affects many other codon reassignment methods.The mechanisms of such bias are likely to be specific to the type of codon reassignment.Thus, the further study of sequence context bias for codon reassignment methods may yield more rich information for understanding and improving ncAA incorporation.
][155] However, many of these cell lines remain challenging to work with because of slow growth rates, low transformation efficiencies, or only partially completed refactoring.The further development of such cell lines will undoubtedly provide tools that enable widespread, robust incorporation of more than three, distinct ncAAs into a single protein sequence.Parallel to these efforts, further improvements in quadruplet recoding and expanded genetic alphabets provide alternative routes that could also enable equally robust incorporation of multiple, distinct ncAAs.
Lastly, in ncAA-expanded protein engineering, a desired ncAA is selected based on knowledge from small molecule chemistry or the desire to randomly screen different ncAAs.Thus, protein engineering with ncAAs is currently undertaken without strong confidence regarding the effect of the target ncAA on protein function.This workflow is often true for both the modification of natural protein function and the installation of new catalytic modalities into proteins.When the desired ncAA has a known o-aaRS/tRNA pair, such studies can be conducted relatively quickly.However, when an o-aaRS/tRNA pair must be evolved, the process is much longer, and the risks are compounded by the uncertainty of identifying a suitable o-aaRS/tRNA pair for the target ncAA.Thus, the ability to predict the effects of an ncAA on enzyme fitness is of immense interest.Unlike machine learning techniques for prAA mutagenesis, a large dataset is not available to train predictive models.However, with the emergence of models that combine deep-learning and physics-based approaches, such predictive methods may soon become accessible.A suitably predictive platform could change how we engineer o-aaRS/tRNA pairs.For example, if we knew that a specific ncAA was highly likely to increase the activity of a target enzyme over 10-fold, we might be more willing to justify the application of complex workflows to engineer o-aaRS/tRNA pairs.In this context, one might see more applications of the alternative o-aaRS/tRNA workflows (Fig. 3) or more laborious methods of enzyme engineering, such as substrate walking.

Concluding Remarks
GCE offers a powerful approach to the creation of proteins containing ncAAs.Since its early years, ncAA incorporation using GCE has overcome many of the challenges and critiques of the technique.The early restrictions of low-yielding E. coli expression of a target protein in minimal media have long been surpassed in the field.In the next advances of GCE, we expect to see the more widespread use of ncAA incorporation by groups with limited prior expertise, the elucidation of factors governing sequence context bias, the ability to robustly incorporate more than four distinct ncAAs, the incorporation of more diverse monomer building blocks, the emergence of computational methods to predict the effects of ncAAs, and a growing tool kit of refactored cell lines.Such advances would undoubtedly contribute to rich discoveries in fields at the interface of chemistry and biology.
Notably, in the process of publishing this review, a novel method for aaRS/tRNA engineering was reported, termed tRNA display.This method directly evolves for productive aminoacylation, uncoupling aminoacylation from the subsequent steps, including elongation factor binding and translation. [156]tRNA display could be instrumental in the identification of aaRS/tRNA pairs for noncanonical monomer substrates that are problematic for the downstream translational machinery.These include a,a-disubstituted or β-amino acids, which have been difficult to engineer using translation dependent selection methods.

Fig. 1 .
Fig. 1.Genetic code expansion.A) Common applications of GCE for studying proteins.B) A schematic of ncAA incorporation by orthogonal aaRS/tRNA pairs (o-aaRS and o-tRNA).

FluorescenceFig. 3 .
Fig.3.Alternative o-aaRS/tRNA engineering workflows.A) A combination of parallel selection and deep sequencing can be leveraged to select for o-aaRS/tRNA pairs in the presence and absence of the target ncAA.Variants of the o-aaRS that can incorporate the target ncAA are ideally enriched in the sequences that are grown in the presence of the ncAA.Analysis of this enrichment can be used to find ncAA-incorporating o-aaRS/tRNA pairs.B) PACE relies on the ability of bacteriophage cycle through host infection and propagation delivering and modified bacteriophage genome (SP) that contains the o-aaRS gene for engineering.The E. coli cells contain an accessory plasmid (AP) and a complementary plasmid (CP) that contain genes for the expression of the o-tRNA, T7 RNA polymerase, and gIII.An amber codon is placed internally either in T7 RNA polymerase or in gIII itself.A bacteriophage gene with an active o-aaRS will enable translation of gIII, resulting in full length pIII.The formation of intact bacteriophage expressing pIII enables the bacteriophage to reinfect E. coli.Thus, continuous growth and re-dilution enables enrichment of bacteriophage containing genomes with functional o-aaRSs.C) FACS leverages a fluorescent reporter with an internal stop codon and sorting of cells in the presence and absence of the ncAA (+ncAA and -ncAA, respectively) to enrich o-aaRS with high activity and low background.D) Bioinformatics approaches leverage o-aaRS/substrate data to train machine learning models that enable the prediction of new ncAA substrates for available o-aaRS/tRNA pairs.