CCc1ccc(-c2nc(-c3nnc4n(OC)c5c[nH]nc5n34)n3nc(C)cc(C)c23)c(=O)n1C(C)[C@H](C)C1C=CCN1
COC1=CCC(O)=C1[C@@H]1C=C(C)N(C(=O)[C@@H]2CC3=CCC(OC)=C3C(=O)O2)C2=C1C=CCC2
CCCCc1cc(C)cc(C(=O)OC2=C(C)CC=C2C)c1NC(=O)c1nc(F)c2c(=N)nc(N)[nH]n12
CCC1=C([C@H]2O[C@@H](n3c(=S)[nH]c4c3=NNCC=4OC)[C@H](C)[C@@H]2C)C(=O)N2CC(OC)=CC(C)=C12
CN1C(C2=CC=CC2)=c2oc([C@@H]3C=CC=CN3C(=O)C3=CCC(=O)N3)nc2=CC1C1=CC=CC1
CCN1C(=O)N[C@](C)(n2[nH]cc2CN2C(=O)[C@](NC(C)=O)(C3C=CC=CC3)C3=C2C(C)=C(C)C3)C1=O
CCC[C@@H](CC)N(CCC(C)C)C(=O)NOC1=C(C)CC(C)=C1NC(=O)[C@@H]1C=CCC(=O)O1
CC1=CC(S)c2c1n(C1=C(S)C(S)=CC1)c(=O)n2NC(=O)Nc1c2c(nn1C(C)C)OCCO2
Cn1c(Cn2c(-c3nc4c(c(=O)n3C)=NCC=4N)cc3c2C=CCC3)nc2c(c1=O)CNC=C2
CCO[C@@H](NC(=O)N1C(=O)C2(O[C@@H]3C=CC=C[C@H]3O2)C2=CCC(CC)=C21)C(=O)c1n[nH][nH]1
CC(=O)CCC(=O)n1c2c(n(NC(=O)CN3C=[SH]C=NC3(C)C(C)=O)c1=O)=CCCC=2
C=COC1=C(O)c2ccnn2[C@H]1NC(=O)N1C(=O)C2(O[C@@H]3CC=C(N)[C@H]3O2)c2ccccc21
CCN1C(=O)N[C@](C)(n2[nH]cc2CN2C(=O)[C@](NC(C)=O)(C3=C(C(C)=O)CC=C3)C3=C2C(C)C(C)=C3)C1=O
CCCN(C(C)=O)N(CC)C(=O)[C@H]1O[C@@H](N2C=CNc3c2nc[nH]c3=O)[C@H](C)[C@@H]1C(C)C
CC(C)CN(CC1=CCCC1)C(=O)NOc1ncccc1NC(=O)[C@@H]1C=NNC(=O)O1
CCC1=NN(C)CC2=C1N(Cc1c[nH]n1[C@@]1(C)NC(=O)N(C)C1=O)C(=O)[C@]2(NC(C)=O)C1=CCC=C1
CC[C@]1(Nn2c(=O)nn[nH]c2=O)N=CCN(n2c(=O)n(CN3N=CCC3=O)c3c(C)ccc(Cl)c32)C1=O
C=CC1=C([C@H]2COC3C(=O)C=C(C)N=C3O2)OC(C(=O)NC2=C(C)C=CC2C=C)=CC1CC
CC1=CCC(C2CCC2)=C1c1nnc([C@H]2COc3[nH]ccc3O2)n1Nc1nnc(C2CCC2)o1
These structures were generated automatically using a Graph-Based Genetic-Algorithm (GA), which attempted to build mimic molecules of a reference structure. The reference structure was compound 35 from 2011Akaji, which is a tetrapeptide transition state mimic for SARS-CoV(-1), with an IC50 of 98 nM [1]. The intent of this work-package was to try and find some non-peptide mimics. The generative part of the method is based on Jan Jensen's python GB-GA (https://github.com/jensengroup/GB-GA), but with a custom similarly metric that provides a smooth continuous scoring including chemical specificity between 3D structures. Each step of the GA evaluated 42 conformers (generated with open-babel, scored with RMSD) of the trial compound, then maximised the 3D chemical overlap of these conformers against the reference molecule in a single 3D pose. The highest conformer score was used in the GA. The algorithm takes approximately 5 cpu seconds per molecule, which is mostly linear in the number of conformers that are checked. 18145 separate GA streams with a population size of 50 and 100 generations were seeded with the MPro-XChem hits (i.e. all fragments) and the first 1000 molecules of ZINC, and scored with a purely electrostatic chemical similarity. The elite (highest scoring) structures from these runs were then used to seed a second round of 9525 GA streams, where the metric additionally included a score of vdW dispersion chemical specificity (scored at the best electrostatic match), where these multiple objectives were scalarised by taking a weighted geometric mean (electrostatic^0.8 * dispersion^0.2). So the molecules here are the best of ~138 million evaluated molecules, after 200 generations of evolution. The final outputs were simply ranked by the scoring function, and approx. 30% discarded by a simple RDKIT 'problematic group' filter. Clearly the algorithm has been rather keen to put lots of heteroatoms (O,N,S) in place, in order to reproduce a chemical similarity to the peptide backbone. No analysis of stability was made. No chemist has adjusted or post-processed these structures. Clearly much is missing from the metric - and any suggestions on how to 'fix' a structure with some obviously problematic hetero atoms / groups would be greatly appreciated! These could then be easily re-scored, to see whether the metric still thought they were similar. This work was done in collaboration with Kuano Ltd, and used computer time on the Imperial College Research Computing Service, DOI: [10.14469/hpc/2232](http://doi.org/10.14469/hpc/2232). The overall aim of this work is to characterise the transition states of the actual substrates of 3CL-pro, and then use this GB-GA generative method, and a suitably developed metric to direct suggest transition state analogues. Future work will be to develop the metrics to make closer analogues, and with Kuano to build a robust platform with more sophisticated synthesis likelihood & stability filters. [1] Akaji, K., Konno, H., Mitsui, H., Teruya, K., Shimamoto, Y., Hattori, Y., … Sanjoh, A. (2011). Structure-Based Design, Synthesis, and Evaluation of Peptide-Mimetic SARS 3CL Protease Inhibitors. __Journal of Medicinal Chemistry__, __54__(23), 7962–7973. https://doi.org/10.1021/jm200870n