Generating Bioactive Natural Product-inspired Molecules with Machine Intelligence

: The computer-assisted design of new chemical entities has made a leap forward with the development of machine learning models for automated molecule generation. The overarching goal of this conceptual approach is to augment the creativity of medicinal chemists with a machine intelligence. In this Perspective we highlight prospective applications of “de novo” drug design and target prediction, aiming to generate natural product-inspired bioactive compounds from scratch. A virtual chemist transforms pharmacologically active natural products into new, easily synthesizable small molecules with desired properties and activity. Computational activity prediction and automated compound generation offer the possibility to systematically transfer the wealth of pharmaceutically active natural products to synthetic small molecule drug discovery. We present selected prospective examples and dare a forecast into the future of natural product-inspired drug discovery.


Introduction
Revitalizing the concept of natural product-inspired drug discovery presents a unique opportunity for future healthcare. Evidently, natural products have traditionally inspired chemists and filled the industry pipelines. [1] Approximately half of the new drugs approved by the U.S. Food and Drug Administration (FDA) are of natural origin or natural product-derived synthetic compounds. [2] For example, marine natural products are of particular interest as anticancer and antiviral agents, with several approved drugs. [3] Similarly, certain edible algae have recently been identified as sources of potential anti-obesity agents. [4] However, despite their unrivalled appeal for pharmaceutical discovery as a source of inspiration, natural products as drugs have to some degree fallen into disgrace in the medicinal chemistry community because of supply problems (ecological sustainability), partially lengthy and costly total syntheses (economical sustainability), and an often the underlying machine learning models. The first applications of generative deep learning (chemical language model) to natural products were published in 2018. [25] Several novel nuclear hormone receptor modulators with micromolar to nanomolar activity were obtained. Selected recent examples of prospective studies aiming to 'scaffold hop' from natural products to isofunctional synthetic molecules are compiled in Fig. 2. Of note, in each case, only a very small number of known bioactive natural products (one to six molecules) served as design template(s) for rule-based compound construction (Fig. 2a-c), or deep generative models (Fig. 2d), respectively.
Two independent design runs with DOGS software, taking either valerenic acid (1) or dehydroabietic acid (2) as design template, resulted in the identical computer-generated molecule 3. [26] This observation highlights the scaffold hopping capability of rule-based molecular design. Target prediction with SPiDER software [27] predicted retinoid X receptor (RXR) agonism for the two natural products and the de novo molecule. Tetrahydroindole derivative 3 was obtained in a microwave-assisted two-step synthesis. Full dose-response analysis revealed low micromolar potency of the natural product templates and de novo generated compound 3 on all three RXR subtypes.
In an attempt to morph structurally more intricate natural products to small synthetic mimetics DOGS software was applied to the marine natural product marinopyrrole A (4), which led to computer generated molecule 5. [28] Compound 5 was obtained in a two-step synthetic route, as suggested by the design software. Target prediction suggested cyclooxygenase-1 (COX-1) inhibition as a hitherto unknown activity of the anticancer agent marinopyrrole A and the de novo molecule. In vitro testing confirmed compounds 4 (IC 50 = 7.7±3.9 μM) and 5 (IC 50 = 0.10±0.05 μM) as direct COX-1 inhibitors. Of note, molecule 5 inhibited COX-1 with low-nanomolar potency in platelets (IC 50 = 0.009±0.001 μM). This de novo designed natural product mimetic preferentially inhibits the biosynthesis of COX-1-derived products in human platelets and monocytes. Compound 5 behaved similar to indomethacin with regard to COX-1 inhibition in platelets and showed greater COX-1 selectivity. The unique binding mode of this de novo generated compound was confirmed by X-ray structure determination of the ligand-enzyme complex. This example demonstrates the applicability of ligand-based molecular design to practical natural product-inspired medicinal chemistry.
In another study, the structurally complex natural product (-)-englerin A (6), a known inhibitor of transient receptor potential (TRP) cation channels, served as design template for DOGS. [29] Using two different computational scoring methods (pharmacophore-based and shape-based), two different de novo generated molecules were prioritized for synthesis. Compounds 7 and 8 were afforded in 3-step and 2-step synthetic protocols, respectively, as suggested by the software. Activity determination confirmed the natural product and the computer-generated molecules as potent inhibitors of TRPM8 (K i = 0.2-0.3 µM). Importantly, the chemical constitution of the natural product templates used for rule-based de novo design served as the only reference information for automated ligand-based molecule construction. This computational approach might, therefore, prove particularly useful in "low-data" unknown molecular mode of action (scientific sustainability). [5] Advances in computer-assisted drug discovery, spurred in part by stellar method development in both machine learning (as a subdomain of 'artificial intelligence') and biotechnology, [6,7] bear promise to overcome some of these limitations. [8,9] In this Perspective we highlight the ability of ligand-based machine learning approaches to generate new chemical structures from scratch by molecular 'de novo' design, and explain how these tools can be used to obtain natural product-inspired drug-like chemical entities (Fig. 1). Selected retrospective and prospective applications illustrate the feasibility of this innovative drug design concept as an amalgamation of traditional and modern medicinal chemistry.

Molecular de novo Design
Molecular 'scaffold hopping' aims to identify pairs of molecules that have markedly different chemical structures but share a certain function of interest, e.g. binding to the same macromolecular target. [10] De novo structure generation provides such sets of structurally diverse molecules that have certain features in common. [11] In fact, automated de novo molecule construction enables medicinal chemists to computationally access a virtually infinite chemical space. Scaffold hopping by de novo design seems particularly suited for finding structurally novel hit and lead compounds. [12] The approach ideally complements virtual chemical library screening, [13] but requires synthesizing the computer-generated molecules.
Molecular design with machine intelligence includes both rule-based and rule-free approaches. [14] Rule-based methods use sets of molecular building blocks and chemical transformations (e.g. virtual reaction schemes) for molecule construction. The DOGS (Design of Genuine Structures) algorithm belongs to this class of methods. [15] In contrast, rule-free 'generative' methods sample new molecules from a learned statistical distribution of the training data, which usually requires a large set of known molecules. [16] Most of the contemporary generative approaches build on deep neural networks. [17] Recurrent neural networks with long short-term memory as 'chemical language models', [18] variational autoencoders, [19] generative adversarial networks, [20] graph neural networks, [21] and various other deep learning architectures, [22] have been proposed for this purpose. Rule-free and rule-based methods alike have been successfully employed for prospective small molecule drug design, resulting in new bioactive compounds. [23] Importantly, both contemporary de novo methods deliver synthetically feasible molecular designs, thereby overcoming a stigma of earlier approaches. [24] In contrast to generative models, rule-based methods do not require large sets of training data. These methods are applicable when only a single active molecule (the 'template' molecule for ligand-based de novo design) or a model of the binding pocket (for structurebased de novo design) is known.

Generating Natural Product-inspired Molecules with Machine Intelligence
Computational de novo design is increasingly employed for automated small molecule construction, starting from mostly synthetic small molecules as design templates or as training data for Natural products Activity prediction Natural productinspired molecules De novo molecular design Synthesis Te sting

De-orphaning the Ta rgets of Bioactive Natural Products with Machine Intelligence
Targeted natural product-inspired drug discovery requires the knowledge of the pharmacologically relevant macromolecular binding partners of the natural products. Numerous software tools are available for target and activity prediction, ranging from structure-based (e.g. docking) to ligand-based (e.g. substructure-, pharmacophore-, shape-based) methods. [34] No method is perfect but all have their individual sweet spot. One of the most successful and widely applied tools is TIGER (Target Inference GEneratoR), which has proven applicable to natural products. [35] The TIGER algorithm works on the two-dimensional chemical structure (chemical constitution) of the ligand and does not take the target structure into account. It is thus applicable to a wide range of targets and ligands. Most target prediction tools, including TIGER, were developed using small molecule reference data. Their prediction accuracy typically suffers when applied to larger natural product structures, e.g. macrocycles or peptides. Aiming to partially alleviate this issue, one can virtually dissect the large natural product into smaller portions and perform target predictions for the resulting "drug-sized" fragments. [36] Fig. 4 shows three such examples of new target identification with TIGER. Resveratrol (9) is a small natural product, for which estrogen receptor beta antagonism was predicted and experimentally confirmed (K i = 0.4 µM). [37] For the medium-sized anticancer depsipeptide doliculide (10) the software revealed prostanoid E receptor EP3 antagonism (IC 50 = 16 ± 7 nM, K B = 6 nM). [38] For the polyketide archazolide A, a known inhibitor of V-AT Pase, farnesoid X receptor (FXR agonist, EC 50 = 0.2 µM) and other hitherto unknown targets could be identified. [36] Aside from providing a straightforward access to target and activity prediction for large natural products, fragment-based prediction sometimes points to the most important functionconveying substructural moieties (magenta colored parts in Fig.  4), which can be useful for chemical derivatization and guided optimization.

Revealing Targets of Complex Natural Products: A Prospective Study
The marine natural product (-)-zampanolide 20 is a microtubule-stabilizing antiproliferative macrolide from the Togan sponge Cacospongia mycofijensis (Fig. 5). [39] Its total synthesis was achieved in 2012, [40] and several structural analogues have been synthetically obtained ever since. [41] The unmodified natural product (12) potently inhibits the growth of different human cancer cells in vitro with nanomolar IC 50 values, whereas its synthetic de-situations that are restrictive for de novo drug design with datahungry generative deep learning.
Using a generative approach (chemical language model) for natural-product inspired molecular design, computer-generated compound 9 was obtained by 2-step synthesis, and experimentally confirmed as a weak pan-RXR partial agonist (EC 50 = 20-30 µM, 6-10 fold receptor activation). [30] The importance of this result lies in the fact that only six natural products (known RXR agonists) were used to bias the generative chemical language model, and no explicit target prediction was performed. The machine intelligence implicitly captured the structural requirements of RXR agonists and ranked the de novo generated molecules.

Case Study: Designing Nucleoside Analogs with a Chemical Language Model
In light of the urgent need for novel antiviral therapeutics, de novo design may play a decisive role for future rapid hit and lead identification. To provide another example of generative de novo design, we targeted RNA-dependent RNA polymerase (RdRp) of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), [31] aiming to obtain ideas for new SARS-CoV-2 RdRp inhibitors. For this purpose, we trained a generative chemical language model. [32] In the first step, a generic model was obtained by learning the chemical syntax of a large set of known, bioactive druglike compounds. In the second step, this model was biased toward nucleoside analogues acting as SARS-CoV-2 RdRp inhibitors (Fig. 3a). [33] New molecules were sampled from this finetuned model (Fig. 3b).
These de novo generated structures contain several wellknown substructures of RdRp inhibitors, but also some innovative moieties. This set of computer-generated molecules, like all molecular de novo designs, should serve as inspiration rather than elaborated lead structures because of certain limitations of the approach. In this particular example, no background information about nucleoside interaction in RNA and the mechanism of RdRp inhibition was considered during neural network training. Neither target selectivity, pharmacokinetic and -dynamic properties, nor the synthesizability of the designs were explicitly considered. Consequently the suggested molecules will benefit from careful checking by human experts and other computational tools. The selected designs then have to be synthesized and tested before any claim of pharmacological activity can be made. Similar considerations should be taken into account whenever employing molecular de novo design methods.  rivative desTHP-(-)-zampanolide (13) is >100-fold less potent. [40] In contrast to 12, 13 shows no measurable binding to stabilized microtubules up to concentrations of 25 µM (unpublished data). We took this information as a starting point to identify potential human drug targets of compound 13. Based solely on the twodimensional structure of compound 13, TIGER software (version 16.10) suggested cholecystokinin receptor B (TIGER score = 90), N-formyl peptide receptor 2 FPR2 (TIGER score = 24.5), and prostanoid receptor EP3 (TIGER score = 22.3) as the most confidently predicted targets. The suggestion of FPR2 seemed particularly interesting because of its vital role in cell differentiation and chemotaxis. [42] Experimental validation corroborated compound 13 as a partial agonist of FPR2 in vitro (75% max. receptor activation, EC 50 = 5±1 µM, Fig. 6). [43] This prospective example positively advocates the use of machine learning models for ligand-based target prediction and ligand 'de-orphaning'.

Conclusions and Outlook
Computational de novo molecular design has proven value for generating reasonable chemical structures. These computergenerated molecules provide working hypotheses for chemical synthesis and experimental testing. Concluding from several successful prospective applications to natural products, this approach seems suited for finding synthetic analogues and mimetics of bioactive natural products. Both rule-based and rule-free machine learning methods for de novo structure generation can be used for this purpose. Remaining challenges are the structural complexity of certain natural products and the functional relevance of certain substructural elements, e.g. carbohydrate moieties, among others.   Also, important activity-determining structural features, e.g., stereocenters, are insufficiently accounted for by the currently available ligand-based de novo design methods.
Recently, there have been attempts to integrate computational molecule construction and activity prediction into a single generative model. These computational models, if proven successful in diverse prospective settings, will be able to automatically generate new chemical entities with desired properties and biological activity, without the need for explicit activity prediction. Certain deep learning approaches have already been shown applicable in this regard. [44] Geometric deep learning techniques allow to explicitly consider three-dimensional chemical features for model building. [21] However, this approach has not been applied to natural products yet. A further area of development is the combination of deep learning with a rule-based molecule construction process, thereby allowing chemists to explore the virtual chemical space that is accessible with the molecular building blocks that are readily available in the laboratory. [45] In the near future we expect to see fully automated laboratories implementing design-make-test-optimize cycles, [46] thereby enabling the full exploration of the structural and functional diversity of natural products for de novo design. According to the World Health Organization, infectious diseases represent a leading global public health threat in the 21 st century. [47] At the same time, populations are growing and ageing owing to successes against infections. This situation paradoxically raises the risk of developing chronic diseases. [48] The proposed drug design approach could enable a renaissance of natural product-inspired pharmaceutical research by amalgamating modern medicinal chemistry with generative artificial intelligence. This matchless setting will alleviate some of the limitations of traditional natural product-based drug discovery, and harness the healing power of naturally evolved solutions for future synthetic medicines.