From an industrial perspective the low turnover numbers and

2021-07-22

From an industrial perspective, the low turnover numbers and limited substrate scope of α-KG halogenases still represents a significant challenge. However, in light of the fact that currently no chemical processes are available for the selective halogenation of aliphatic carbon centers in complex molecular scaffolds such as indole alkaloids, extensive enzyme engineering work seems warranted as it may lead to a first-in-class technology toward stereoselective sp3 carbon functionalization.
Perspective Optimization of nature’s enzymes for industrial applications has become state-of-the-art and development of biocatalysts, which are highly active on non-natural substrates at high concentrations and in the presence of organic solvents, is routinely achieved within a reasonable time frame [3]. For the evolution of intrinsically complex biomacromolecules such as C–H-activating enzymes the often-used semi-rational enzyme design strategy is, however, rarely sufficient to boost activities to an industrially useful level (Table 1). Here, the applications of (ultra) high-throughput (HTP) experimental techniques, such as nano-droplet/micro-droplet technology [69,] and microarray technology [71], may be a way to accelerate evolution efforts as it enables the analysis of millions of protein variants per hour or per day. However, these techniques typically rely on fluorescence read-outs which are not readily applicable to typical chemical and pharmaceutical enzyme screening campaigns but would rather be suitable to answer fundamental research questions. Considering the typical number of variants which can be analyzed in a conventional LC-coupled or GC-coupled plate-based assay (maximum 103–104 variants per screening), it becomes evident that smart and focused libraries are required to streamline engineering efforts. Several bioinformatics tools, such as HotSpot Wizard [72] and 3DM [73], link structural and catalytic information of individual enzymes, enabling the identification of key amino lp-pla2 residues that are intimately involved in catalysis [74]. Complemented by reduced library design using degenerate codons [75] and exploiting recent advances in gene synthesis to generate libraries with high uniformity of variant representation at reasonable costs [76], the in silico library design will guide variant selection at the outset of an evolution project. However, the prediction of the relationship between sequence/structure and enzymatic activity during an evolution experiment remains a formidable challenge [7,77]. In this context, machine-learning algorithms hold great promise to help interpret data generated in laboratory evolution as they are able to model complex relationships between protein sequence and function by finding patterns that are important for activity. Trained by experiment, the models account for any factors that contribute to specific properties, including those that are unknown. Besides ProSAR [78], several other algorithms have been developed recently which infer the protein fitness landscape for protein sequence characteristics, for example enzyme activity and stability, directly from experimental data using Bayesian learning techniques such as Gaussian processes. These algorithms were for example used to design thermostable P450 variants [79] and fluorescent proteins with altered fluorescence properties []. The relatively small number of variants screened in these studies highlight the remarkably high hit-rates of the machine-learning algorithms. To accelerate the development of C–H-activating enzymes, which to date have mainly been optimized via more traditional semi-rational approaches (Table 1), the combinatorial use of the above-mentioned technologies (i.e. in silico library design, advanced DNA synthesis technology and machine learning algorithms) should be targeted (Figure 3). As protein engineering continues to mature, C–H-activating enzymes may soon figure more prominently in industrial applications.