Excelzyme: A Swiss University−Industry Collaboration for Accelerated Biocatalyst Development

: Excelzyme, an enzyme engineering platform located at the Zurich University of Applied Sciences, is dedicated to accelerating the development of tailored biocatalysts for large-scale industrial applications. Leve-raging automation and advanced computational techniques, including machine learning, efficient biocatalysts can be generated in short timeframes. Toward this goal, Excelzyme systematically selects suitable protein scaffolds as the foundation for constructing complex enzyme libraries, thereby enhancing sequence and structural biocatalyst diversity. Here, we describe applied workflows and technologies as well as an industrial case study that exemplifies the successful application of the workflow.


Introduction
Biocatalysis applies enzymes, Nature's catalysts, for synthetic purposes.The technology has been explored scientifically for well over a century yielding key insights into the mechanisms that underline the catalytic capability of various enzyme families.This knowledge enabled the application of enzymes in the chemical industry, where product quality, stability and overall process efficiency are indispensable. [1]Besides their exceptional chemo-, regio-and stereoselectivity, which makes them particularly promising for meeting specific product requirements, [2,3] biocatalysts can render chemical syntheses and processes more environmentally benign by lowering the number of reaction steps, consumption of energy and raw materials, in addition to reducing byproduct and waste generation. [4]7] Engineering enzymes for asymmetric transformations and integrating them into chemo-enzymatic cascade reactions to produce the desired small molecules has become a valuable tool in drug development and manufacture.0][11][12][13] Notable examples from the pharmaceutical industry include 1) the ketoreductase-mediated synthesis of a chiral diol intermediate of an antitumoral gamma secretase inhibitor; [14] 2) the chemoenzymatic processes involving transaminases to synthesize chiral amine precursors of antidiabetic sitagliptin [15] or antihypertensive sacubitril; [16] 3) the application of an imine reductase for reductive amination and consecutive kinetic amine resolution to yield an anticancer lysine-specific demethylase-1 inhibitor; [17] and 4) the comprehensive synthesis of anti-HIV islatravir through a series of reactions involving five evolved and four auxiliary enzymes. [18]21] However, sometimes enzymatic routes -even if relying on optimized enzymes -fall short of meeting expected key performance computer-aided design has emerged as an additional tool to customize protein scaffolds for application.Computer-aided design primarily relies on the exploration of protein structure, substrate binding mode models as well as protein dynamics under different conditions (e.g.temperature, pH or solvent). [25]By applying fundamental principles of chemistry, physics, and statistics, alterations in energy as well as interactions between individual amino acids and ligands are identified during structural adjustments.32] Lastly, artificial intelligence (AI) holds great promise to model complex relationships between protein sequence and function by finding sequence patterns which are important for enzymatic function. [33]Trained by experiment, AI models account for any factors that contribute to specific properties, including those that are unknown. [34,35]

Excelzyme Collaboration
Established in 2019 by the Competence Center for Biocatalysis (CCBIO) at the Zurich University of Applied Sciences (ZHAW) and the Process Chemistry & Catalysis Department at F. Hoffmann-La Roche Ltd. (Roche), Excelzyme aims to accelerate enzyme discovery and engineering campaigns to yield biocatalysts suitable for their implementation in drug substance manufacturing processes.

Excelzyme Enzyme Discovery Workflow
At the initiation of each project, suitable enzyme starting scaffolds are mined.Using available enzyme wildtype libraries at CCBIO and Roche, which are complemented with literature sourced biocatalysts, enzymes with initial activity for the desired transformation are identified.The sequences of active enzymes then serve as input for mining of public databases, such as NCBI and GenBank, [36] PDB [37] or UniProt [38] to collect relevant evolutionary relatives around the promising starting enzyme candidates.The efficacy of this mining process can be improved through the use of sequence search and clustering tools like DIAMOND, [39] OrthoFinder2 [40] or MMseqs2, [41] which provide a better balance between speed and sensitivity compared to more established methods (e.g.PSI-BLAST).Combined with careful dataset curation, such as redundancy and length filtering, signal peptide detection, as well as in-depth evaluation of domain composition (e.g.transmembrane regions or domains), the mining process typically yields a diverse set of protein sequences.This curated sequence data is not only exploited for the discovery of functional orthologs, but also for the identification of conserved, diversifying amino acid residues and related protein functionalities, [40,42] information which can be profitably employed in the subsequent engineering campaign.Thanks to advances in homology modeling [43][44][45] and AIenabled de novo structure prediction, [46][47][48][49][50] sequences of interest for which no experimentally determined structure is available, can be modeled revealing the enzyme's tertiary and quaternary structures, oligomerization states, as well as cofactor dependencies.With this information, the capability of the orthologous enzyme to bind the substrate of interest can be preliminarily evaluated.In addition, generalized predictors of industrially relevant properties, such as solubility, [51,52] thermostability, [53][54][55][56] and pH stability [57] may be included in the selection process.However, the bioinformatic workflow heavily relies on data-informed or experience-guided assessments of the models to avoid undesirable trade-offs, such as increased protein stability at the cost of significantly lower expression, solubility, or activity (Fig. 2, Table 1). [58]uring an Excelzyme enzyme discovery phase, a first set of, up to, hundreds of orthologous candidates with a typical shared protein sequence identity of 30-80 % is screened for the desired reaction.Clearly, the actual ortholog library size and protein se-indicators (KPIs), such as acceptable conversion at high substrate load, satisfactory space-time yields and straightforward downstream processing, and thus might necessitate further improvements in order to be comparable to well-established chemical routes. [22]While no fail-proof recipe exists, a key step in developing an effective enzyme for a technical application is the selection of a suitable protein scaffold as a starting point for enzyme engineering.The enzyme selection process must be approached on a case-by-case basis to fit the required process demands -while in some cases the most active enzyme will be chosen for further development, in other cases characteristics such as thermostability or selectivity define a given enzyme's suitability to serve as a starting scaffold. [23,24]Once a biocatalyst with at least a minor ability to produce the desired product is identified, it then serves as a parent for subsequent enzyme engineering strategies aimed to tailor the catalyst's properties.For this purpose, many strategies and methods have been developed, including directed evolution, semi-rational enzyme engineering including computer-assisted design, and most recently, algorithm-aided enzyme engineering harnessing machine learning approaches (Fig. 1). [25]he Nobel-prize winning technology of directed evolution enables the enhancement of enzyme performance by mimicking the process of natural evolution in the laboratory: Following the generation of genetic diversity through random mutagenesis, the encoded enzyme variants are subjected to rigorous screening.Improved variants are selected based on specific criteria such as activity, selectivity or thermo-and solvent stability. [26,27]Notably, directed evolution does not require an understanding of the enzyme's underlying structure-function relationship.However, in a typical directed evolution huge libraries of enzyme variants need to be screened to find viable solutions, making it a resourceintensive process. [26,28]n contrast, semi-rational enzyme design relies on structural information or is guided by conserved amino acid sequences.By using this additional information, more targeted enzyme variant libraries can be generated, in this way reducing the subsequent screening burden. [25,29]Thanks to the significant expansion of computing power and advances in protein design algorithms, To predict the protein fitness landscape from protein sequence characteristics using data from a fraction of a given library, [34] we rely on machine learning (ML) algorithms.For example, Bayesian learning techniques such as Gaussian processes have been demonstrated as useful for the design of thermostable P450 variants, [101] fluorescent proteins with altered fluorescence properties, [102] more active and regio-divergent halogenase variants, [103] and toluene-degrading monooxygenase mutants with increased substrate specificity. [104]The relatively small number of variants screened in these studies confirms the remarkably high hit-rates of ML.More recently, deep learning and generative models, have found application in protein and enzyme design.For instance, the UniRep neural network utilizes deep learning to extract essential protein structure characteristics directly from amino acid sequences, accurately predicting the impact of amino acid replacements on protein function and saving significant resources. [96]Furthermore, a method called RFdiffusion is a highly effective generative model for protein backbones, exhibiting exceptional performance in various protein design tasks and holding a great potential in the de novo design of artificial proteins or enzymes. [97,98]

Excelzyme Technology Platform
To meet the often challenging project timelines of chemical process development, we rely on automation for the efficient production of enzyme variants in the desired formulation.Additionally, automated analysis of sequencing and function data speeds up the interpretation of the screening experiments.
An enzyme optimization round involves several key stages (Fig. 3).In the first step, variants or screening libraries are designed according to the selected enzymes engineering strategy.Hotspots can be tackled individually or in a combinatorial fashion, leading to in silico libraries of hundreds of thousands to millions of variants, which can be experimentally created by reduced library design with degenerate codons, [105] design of oligo pools, [91] or by full or partial gene synthesis with high uniformity of variant representation at reasonable costs. [106]Subsequently, these gene constructs are directly utilized or further processed to generate quence distribution is critically dependent on the size and diversity of the considered enzyme family.While some enzyme families, such as a-ketoglutarate dependent halogenases might only have a few representatives, distinctly larger enzyme families, such as the short-chain dehydrogenase/reductase (SDR) family, [59] are a promising source for diverse screening panels sharing low sequence identity yet similar catalytic functionalities. [60,61]These initial discovery rounds are usually followed by a more refined search, where tens of candidates with higher phylogenetical, structural, or motif-specific homology toward the seed enzyme are selected and screened to identify an even better-suited enzyme starting scaffold.In this way, not only bioinformatic tools and molecular modeling, but also statistical analysis and data science allow us to shed light on the molecular basis of enzymatic performance.Overall, the data-driven enzyme candidate selection usually leads to improved starting points for subsequent enzyme engineering.

Excelzyme Enzyme Engineering Workflow
Once the starting protein scaffold is defined, a set of methods for random or rational enzyme engineering are employed (Table 1).When information on the sequence-function relationship of the selected enzyme is insufficient, error-prone PCR-based methodologies [28] or (deep) mutational scanning [99] are applied.Clearly, the insights gained through these approaches can be leveraged in subsequent engineering rounds by analyzing the randomly generated diversity. [100]In case of knowledge-driven enzyme engineering for increased activity or selectivity, library design relies on the identification of key amino acid residues involved in substrate binding and catalysis.These hotspots are identified via conservation analysis, enzyme modeling, molecular docking, and dynamics.Enhanced thermal or solvent stability is attained via in silico variant design based on evolutionary information, thermodynamic stability, and protein flexibility.In this context, it is important to assess and benchmark all applied computational tools to minimize the probability of spurious correlations, allowing us to build a repertoire of validated and complementary methods for future engineering cases with similar problem definitions.Based on sequence and structural data from public databases, state-of-the-art methods in phylogenetic analysis, protein feature predictions, and molecular modeling are applied for the strategic design of a curated collection of orthologous enzyme candidates with putative function.Experimental data of the enzyme set towards the desired reaction serves as seed for a second fine-tuned collection.This iterative process typically leads to the identification of a protein scaffold with improved properties, well-suited for subsequent engineering efforts.
procedures.Automation of these steps support the rapid and uniform preparation of whole cells or cell lysates for downstream screening.In each case, screening conditions are continually updated, drawing insights from reaction-and process engineering outcomes from the industry partner.Depending on the product's enzyme variants through appropriate mutagenesis and cloning strategies (Table 1).
To screen enzyme libraries, the Excelzyme platform employs a Tecan Fluent robotic platform allowing automated colony picking, cultivation, protein expression, cell harvesting and cell lysis Ta ble 1. Selection of techniques and bioinformatic tools, which − among others − are employed in Excelzyme's enzyme discovery and engineering workflows.

Techniques or tools
Purpose in pipeline Description, indicators Ref.

Enzyme variant library generation (n: number of mutation sites)
cepPCR random mutagenesis fragment-targeted or "casting" error-prone PCR [62]   mutational scanning (random) single-site saturation mutagenesis n = 50 -150; gene synthesis [63]   NNK, 22c-trick, customized degenerate oligos single-site or combinatorial saturation mutagenesis n ≤ 3; degenerate oligos with reduced or no codon redundancy [64,65]   oligo pools combinatorial saturation or focused combinatorial site mutagenesis n > 3; ≤ 300 bp-oligo synthesis, unlimited pool size [66]   MEGAWHOP PCR mutagenesis, gene assembly whole plasmid amplification via a megaprimer [67]   In-Fusion, TEDA gene assembly efficient ligase-free seamless cloning [68-70]   Screening rapid achiral UHPLC-UV/ MS substrate conversion, product formation, regio-or diastereo-selectivity % conversion, product distribution, % de (high throughput: ≤ 60 min per 96-well plate) [71]   chiral LC/GC stereoselectivity % ee or % de (variable throughput) [72,73]   UV/Vis-spectroscopyi nitial reaction rate min -1 (moderate throughput: 60 -90 min per 96-well plate) [74]   nanoDSF thermostability T m (48 samples per run) [75]   (Next-Generation) Sequencing Sanger hit confirmation plates or individual variants; outsourced service [76]   Nanopore consensus NGS multiplexed sequencing for ML applications 2.3k variants (2 single-use flow cells), ≥ 99 % accuracy at ≥ 50 reads per variant [77,78]   Computational DIAMOND, OrthoFinder2, MMseqs2 ortholog search (discovery) phylogenetic orthology ("orthogroup") clustering [40,41,79]   Protein-Sol solubility prediction sequence-based, for ortholog search [52]   DeepSTABp, ProTstab2 stability prediction sequence-based, for ortholog search [53,55]   SWISS-Model structure prediction accurate homology modeling when known templates are available [80]   AlphaFold (DB), OpenFold structure prediction accurate AI-based ligand-free structural prediction of novel enzymes [49,50,81] 3DM hotspot identification multiple sequence alignment analysis of a superfamily [82]   AutoDock Vina hotspot identification molecular docking (substrate, intermediate, product and/or cofactor) [83] MOE, Moloc hotspot identification ligand binding mode modeling [84,85]   PLIP (PyMOL Plugin) hotspot identification protein-ligand interaction profiler [86]   CAVER 3.0 (PyMOL Plugin) hotspot identification ligand access channel or tunnel analysis [87]   AMBER, OpenMM, MDTraj hotspot identification, post-analysis MD simulations for analysis of protein conformational changes, protein-ligand interactions, solvent models [88-90]   LibGENiE deleterious mutation identification sequence space reduction by excluding destabilizing mutations [91]   Algorithm 2.1 GPML activity & selectivity prediction Gaussian processes for ML [92,93]   EvoEF2, ACDC-NN (thermo-) stability prediction structure-based prediction of free energy changes in protein variants [94,95] UniRep activity & stability prediction sequence-based deep representation learning [96]   RFdiffusion de novo protein design generative model for protein backbones [97,98]  properties and the enzyme engineering targets, various analytical techniques are deployed, such as rapid achiral UHPLC-UV/MS for conversion, chiral LC/GC for selectivity and nano differential scanning fluorimetry (nanoDSF) for protein melting temperature determination.Following these measurements, the function data of the generated enzyme variants is gathered and analyzed in an automated manner using scripts that report fold-improvement over the parent values, enable quality checks, and generate plate heatmaps for effective data visualization.Ve rification of hit variant sequences is accomplished by outsourced Sanger sequencing.Sequencing data is analyzed using a custom script that evaluates sequencing read length and quality, enables manual checks of the DNA chromatograms, and generates output files of the variants and their corresponding amino acid substitutions.To obtain the sequences of hundreds or thousands of variants required to train ML algorithms, we additionally implemented multiplexed nanopore-based next-generation sequencing (NGS). [107]Recently, the high error rate associated with nanopore sequencing has been overcome by introducing unique barcodes into individual sequences via PCR, followed by the processing of sequencing data through customized statistical analysis for consensus basecalling. [77,78]Our in-house technology currently allows us to elucidate the sequences of 1'152 variants (12 × 96-well plates) per single-use flow cell, requiring a minimum of 50 reads per amplicon to achieve ≥ 99% accuracy at the single nucleotide level in 72 h, including lab work and sequencing.Furthermore, nanopore sequencing can be applied to sequences beyond 1 kb reads, a limitation inherent to Sanger sequencing.This translates into more substantial cost savings when larger genes (> 1 kb) are sequenced.The generated sequence-function data is then curated and subjected to statistical analysis, modeling, or ML.These analytical tools help in data-driven variant selection and library design for subsequent engineering rounds, allowing for continuous enhancement of enzyme performance. [34,100] Case study: Data-driven KRED Evolution

KREDs in the Pharmaceutical Industry
117] Nowadays, engineered KREDs are ubiquitous in the pharmaceutical industry and applied in reactions with NAD(P)H as hydride donor and generally isopropanol for cost-effective cofactor regeneration. [115,116]These biocatalysts have enabled access to a wide Fig. 3. Iterative enzyme engineering cycle enabled by the Excelzyme technology platform.Depending on the selected evolution strategy, enzyme variant libraries or individual variants are designed, (A).Mutated genes or gene fragments are generated by outsourced solid-phase oligonucleotide synthesis, (B), or by PCR amplification using oligo pools or individual mutagenic primers, (C), followed by suitable ligase-free cloning methods.Fully automated colony picking, cultivation, protein expression, cell harvesting, or cell lysis protocols, (D), are applied to obtain whole cells or enzyme lysates for high-throughput screening, (E).Based on the desired performance indicators and the target product's properties, rapid UHPLC-UV/MS (conversion), chiral LC-UV/GC-FID (selectivity), UV/Vis-spectroscopy (initial reaction rate), or nano differential scanning fluorimetry (thermostability) are usually applied, (F).Variants are sequenced by Sanger (hit confirmation) or in-house by Nanopore-based consensus NGS (≤ 10 % of library size for ML applications), (G).The data generated during each optimization round is then curated and used for statistical analysis, modeling or ML to allow data-driven variant selection or library design for subsequent engineering rounds, (H).
tablished through the asymmetric KRED-catalyzed reduction of prochiral ketone 1 to the desired (R,R)-trans alcohol intermediate 2 (Scheme 1).We focused on this reaction step to showcase the efficacy of automation-and algorithm-aided enzyme engineering for the development of a high-performance biocatalyst for a potential second-generation process. [74]o identify a robust enzyme starting scaffold for engineering, we screened our in-house KRED toolbox consisting of 63 wildtype enzymes of bacterial, fungal and plant origin for the conversion of 1. [74] Following the screening, the NADP + -dependent aldehyde reductase II from Sporidiobolus salmonicolor, [128,129] herein denoted as Ssal-KRED, was selected for further engineering owing to its absolute stereopreference for 2 (Fig. 4A).132][133] At the outset of the study, we defined the campaign goal as improving Ssal-KRED's performance by a factor of 50 in the presence of isopropanol for NADPH regeneration.Enzyme variant performance was measured as fold-improvement over the wildtype enzyme (FIOWT), calculated from a kinetic UV assay based on consumption of 1. Informed by mutational scanning data on every second amino acid within the protein in addition to a substrate-bound enzyme model and literature, six sites in the substrate entrance tunnel or substrate binding cavity (F97W, L174, A238, L241, M242, Q245; Fig. 4C) were chosen to be explored in single-site saturation mutagenesis (SSM) libraries.The screening of these libraries resulted in the identification of several beneficial array of chiral synthons leading to APIs like atorvastatin, [118,119] montelukast, [120] simeprevir, [121,122] and ipatasertib. [123,124]2.Engineering of a KRED for the Synthesis of an Ipatasertib Precursor Ipatasertib ( 3) is a potent protein kinase B inhibitor developed for the treatment of metastatic castration-resistant prostate cancer and triple-negative metastatic breast cancer.[125][126][127] Its synthesis spans ten steps and involves eight isolated intermediates, leveraging chemical and enzyme catalysis for the incorporation of three stereocenters.[123,124] One of these chiral centers is es- mutations, including F97W, M242F, and Q245T, with a FIOWT between 1.4 and 3.6.Building on the obtained data, we set up four combinatorial 3-site and one combinatorial 5-site saturation mutagenesis (CSM) library by varying positions L174, A238, L241, M242, Q245 while fixing mutation F97W (all other substitutions at position 97 had exhibited a detrimental effect in the prior SSM library).With a ML application in mind, we then opted to screen a fraction of each library.Notably, ML can be applied at any stage of an enzyme engineering campaign provided sufficient and high-quality data is available for training the algorithm.To obtain the necessary sequence-function data, we thus covered ~7 % and 0.024 % of the theoretical size of the 3-and 5-site CSM libraries, respectively.In this screen, we identified variant M1 (F97W/L241M/M242W/ Q245S) as the best performing variant, which exhibited a FIOWT of 8. By combinng M1 with beneficial surface residue mutations L316M and T342M, identified in the initial mutational scanning library, variant M2 was created.This variant exhibited a FIOWT of 9.
As a next step, the sequence-function data of the CSM libraries was used as input to train a Gaussian process-based ML algorithm. [101,134]Toward this goal, we used, among others, the data derived from the multi-site CSM libraries to train the algorithm (2'600 datapoints).Instead of constructing specific variants individually, we opted to build a small variant library (ML filtered library) which contained the top amino acids the ML algorithm had predicted to be beneficial.This library consisted of 75 variants and gratifyingly its screening revealed variant M3, which contained additional mutation A238K and displayed a FIOWT of 22.While variant M3 would have also been identifiable through combining beneficial mutations from the SSM libraries, [135,136] it should be noted that the ML approach allowed us to obtain such results by screening only 75 variants instead of 29'400 variants.The latter number corresponds to the size of the hypothetical library if all beneficial mutations identified for the six hotspots were to be explored in a combinatorial manner with the respective wildtype amino acids.In practical terms, screening such a library size would have required the preparation and analysis of one thousand 96-well plates, considering the necessary 3-fold oversampling.
Using the ML-improved protein scaffold as parent, we further increased enzyme activity and stability by targeting experimentally determined and literature-based [130][131][132]137,138] amino acid positions in the frame of iterative site mutagenesis as a complementary strategy. [139] Upn tackling fourteen new positions located in the substrate-or NADPH-binding sites, we identified beneficial substitutions Y246G, S224A and T134V in three successive rounds and obtained hit variants M4, M5 and M6 with FIOWT values of 24, 29 and 58, respectively (Fig. 4B).
After completing six evolution rounds on Ssal-KRED, targeting 180 amino acid positions across 13 libraries, we successfully achieved our project's goal with the 10-amino acid variant M6 (F97W/ T134V/S224A/A238K/L241M/M242W/Q245S/Y246G/ L316M/T342M) (Fig. 4C).Notably, M6 exhibited a 64-fold higher apparent k cat and improved robustness under process conditions compared to the wildtype enzyme.Kinetic studies and modelling suggested that key mutations for the efficient reduction of 1 to 2 were substrate access tunnel or substrate binding mutations T134V, A238K, M242W and Q245S, while changes on the enzyme surface were likely responsible for enzyme stability.While reactions with the wild-type enzyme resulted in 26 % conversion and > 99.5 % de (R,R-trans) at 100 g/L of 1 after 24 h, preparative scale reactions with M6 resulted in ≥ 98 % conversion and 99.7 % de (R,R-trans) at the same substrate loading after 30 h, showcasing the technical and commercial viability of the process based on the engineered enzyme variant.

Conclusions and Outlook
The Excelzyme enzyme discovery and engineering platform allows the fast development of biocatalysts, in line with the often challenging process development timelines.Typically, an Excelzyme project, including enzyme discovery and engineering, runs for 10 to 11 months, with one evolution round being limited to 4 to 5 weeks of library design, experimental work, and data analysis.In this fashion, Excelzyme has delivered additional engineered enzymes stemming from several different enzyme classes to be employed for drug synthesis.Importantly, as a collaborative venture between academia and industry, the platform equally profits from enzyme engineering and process development knowhow, which need to go hand-in-hand to yield viable biocatalytic manufacturing routes.
Complementing semi-rational enzyme engineering principles, Excelzyme evaluates and employs bioinformatic tools.As more robust and powerful algorithms are being developed, we expect that computer-and AI-assisted enzyme engineering will be able to address a broader variety of enzyme design challenges, thus enabling more time-and resource-efficient development of versatile biocatalysts suited for various industrial applications.

The
Excelzyme team.From left to right, back row: Rebecca Buller and Hans Iding; front row: Nadine Duss, Daniela Milbredt, Sumire Honda Malca, and Peter Stockinger.

Fig. 1 .
Fig. 1.Strategies for enzyme engineering.Directed evolution mimics natural evolution by random introduction of diversity and subsequent selection of the desired properties.Mutations are introduced more selectively in the context of (semi-)rational and computer-aided design.Trained by experimental data, artificial intelligence (AI) allows the identification of complex relationships between protein sequence and function by finding patterns relevant for enhanced enzyme activity, selectivity, or stability.

Fig. 2 .
Fig. 2. Excelzyme's enzyme discovery workflow.Based on sequence and structural data from public databases, state-of-the-art methods in phylogenetic analysis, protein feature predictions, and molecular modeling are applied for the strategic design of a curated collection of orthologous enzyme candidates with putative function.Experimental data of the enzyme set towards the desired reaction serves as seed for a second fine-tuned collection.This iterative process typically leads to the identification of a protein scaffold with improved properties, well-suited for subsequent engineering efforts.