CC(=O)Nc1cnc(NN2CCC(c3cccc(C)c3Cl)CC2)nc1
N#Cc1cc(S(=O)(=O)c2ccoc(=O)c2)ccc1N1CCO[C@@H]2CCC[C@H]21
COc1cccc(CNC(=O)N2CCN(C(=O)N3CCCCC3)CC2)c1
A generative language model of SMILES strings was created from the set of 1,946 current submissions (as of March 27, 2020) and 66 active site fragments. This generative model was sampled, and produced ~20,000 novel structures. The QED score for each generated molecule was computed, and the top 1,000 generated molecules according to QED score were kept. Any generated molecules that were identical to the submissions were discarded during generation. Also, the Tanimoto similarity of each generated molecule was computed against each submission and fragment molecule, and any generated molecule with a Tanimoto similarity >= 0.9 was discarded during generation. The generated compounds generally had low average Tanimoto similarity to the submission and fragment compounds. This set of 1,000 generated molecules was combined with a set of 1,018 molecules generated on a previous date (using a different language model). Each of the 2,018 generated molecules were then submitted to docking with the SARS-CoV-2 apo-MPro protein structure from the 6YB7_model.pdb file (provided by Diamond Light Source), using AutoDock Vina, with an exhaustiveness value of 16. Docking was focused on the active site. The molecules with a best mode binding energy of -7.9 kcal/mol or less were kept, resulting in 30 candidates. These 30 molecules were then docked once again, but using an exhaustiveness value of 256. Docking was repeated 3 times for each of the candidates. Also, the Synthetic Accessibility (SA) score was calculated for each candidate, as well as a toxicity estimate using the ProTox-II web service. A final weighted score was computed for each candidate using the normalized mean Vina best mode binding energy (50%), the QED score (20%), the normalized SA score (20%), and the normalized estimated LD50 (10%). The top 3 candidates are submitted here. Candidate 1: CC(=O)Nc1cnc(NN2CCC(c3cccc(C)c3Cl)CC2)nc1 Weighted Score: 0.898 Mean Best Mode Binding Energy: -8.47 +/- 0.19 kcal/mol (n=3) QED: 0.872 SA: 2.694 LD50: 2500 mg/kg Candidate 2: N#CC1C(N2[C@@H]3CCC[C@H]3OCC2)=CC=C(S(C2=CC(=O)OC=C2)(=O)=O)C=1 Weighted Score: 0.882 Mean Best Mode Binding Energy: -8.00 +/- 0.00 kcal/mol (n=3) QED: 0.796 SA: 3.648 LD50: 5000 mg/kg Candidate 3: COc1cccc(CNC(=O)N2CCN(C(=O)N3CCCCC3)CC2)c1 Weighted Score: 0.877 Mean Best Mode Binding Energy: -7.90 +/- 0.00 kcal/mol (n=3) QED: 0.898 SA: 2.014 LD50: 2000 mg/kg
The PDB files for candidates 2 and 3 will be submitted through the forum thread for this submission.