O=C(c1cc2sccc2s1)N1CCOC(CN2CCOCC2)C1
c1ccc(OC2CN(Cc3c[nH]cn3)C2)cc1
CN1CCN(C(=O)CNc2c(S(N)(=O)=O)ccc3ccccc23)CC1
c1cc(CN2CC3(CCOC3)C2)c2cccnc2c1
O=S1(=O)c2ccccc2CN1CCN1CCCCC1
O=S(=O)(Nc1nsc2ccccc12)c1ccc2ccccc2c1
c1ccc(N(Cc2ccsc2)Cc2cccc(CN3CCOCC3)c2)cc1
c1coc(CC2CN(Cc3cc4ccccc4[nH]3)C2)c1
c1ccc2ncc(CN3CC(Cc4ccoc4)C3)cc2c1
Cc1ccnc(CN2CCCC3(CCOC3)C2)c1
Cc1ccc2cc(CNCC(=O)N3CCN(C)CC3)[nH]c2c1
CS(=O)(=O)c1ccc(CNc2nc3ccccc3[nH]2)s1
CCc1ccc(CN2CCN(CCc3ccns3)CC2)cc1
CCNCc1cn(CC(=O)N2CCN(C)CC2)nn1
Our main goal was the discovery of new inhibitors of M-pro using machine learning. We started with the M-pro crystallography dataset and literature binding affinity dataset, which was carefully curated based on removing duplicates, selecting highest quality data sources, removing salts, heavy atoms etc. We used this dataset to train a deep-learning classification model based on a graph convolutional architecture. Selecting a large enough number of negative data points to train the model on was crucial to enable effective screening, as otherwise false positives end up dominating the output and destroy any meaningful chance of selecting binders. Team members working on the submission have extensive experience in getting these criteria right. Fortunately the dataset was diverse enough to enable an efficient virtual screening process, and form prior a hit rate as high as 5% would not be an unreasonable expectation. Virtual screening itself was performed on pre-curated subsets of the REAL diverse dataset using this model as a metric. In addition to this we incorporated a number of selection criteria in post processing, priority was synthesisability but also included novelty and stability.
We didn't use x0072 as a fragment!