CC(c1cccc(F)c1)N1CCN(c2cc(C(F)(F)F)nc(-c3cccnc3)n2)CC1
Cc1nc(-c2ccccc2)cc(N2CCN(S(=O)(=O)c3cccs3)CC2)n1
Cc1nc(C)c(Br)c(NCCc2cc(F)cc(F)c2)n1
COc1ccc(-c2nc(-c3ccc4c(c3)OCO4)[nH]c2-c2ccccc2)cc1
COc1ccc(Cl)cc1Nc1nc(-c2cccnc2)nc2ccccc12
Cc1cc(C)cc(Oc2nc3c(c(=O)n(C)c(=O)n3C)n2Cc2cccc(Br)c2)c1
c1cncc(-c2nc(N3CCN(Cc4ccsc4)CC3)c3ccccc3n2)c1
COc1cc2c(cc1OC)CN(Cc1ccc(C(F)(F)F)cc1)CC2
COc1ccccc1CN(Cc1cccnc1)Cc1ccc(C(F)(F)F)nc1
COc1ccccc1N1CCN(C(=O)c2cc(-c3cccnc3)nc3ccccc23)CC1
Clc1cccc(C2Oc3ccccc3C3CC(c4cccs4)=NN32)c1
Cc1cccc(C2CC(c3ccc4c(c3)OCO4)=NN2S(C)(=O)=O)c1
Cc1cc(=O)[nH]c(CN2C[C@@H]3CC[C@H](C2)N3Cc2ccccc2)n1
COc1ccc(-n2c(-c3ccccc3)nc3nc4ccccc4nc32)cc1Cl
COc1ccc(N2C(=O)C(Oc3ccc(Cl)cc3Cl)C2c2ccc3c(c2)OCO3)cc1
COc1ccccc1N1CCN(C2CCN(Cc3ccccc3C(F)(F)F)CC2)CC1
FC(F)(F)c1cscc1CN1CC2CC(C1)N(Cc1ccccc1)C2
COc1cc(OC)c(OC)cc1CN1CCN(Cc2cc(OC)c3c(c2)OCO3)CC1
COc1cc2c(cc1OC)C(c1cccs1)N(S(N)(=O)=O)CC2
COc1ccc(Cl)cc1C(=O)N1CCN(C(C)c2cccc(F)c2)CC1
We first trained baseline predictive models from curated inhibition data from the COVID Moonshot and other public sources, and used these to identify an optimal model, optimising over architectures and hyper-parameters. We ultimately converged on a graph-convolutional deep learning model trained on Moonshot data. In order to obtain the best possible generalisation properties the dataset was augmented using negative data (from XChem crystallographic screen) and the model was pre-trained on a dataset representing the area of chemical space we wished to explore for synthesis. In addition to this we built an out of distribution (OOD) classifier to generate an estimate of how reliable the predictive model is when applied to novel areas of chemical space. This was a key step, as it provides a handle on generalisability of deep learning models, which is especially problematic in scenarios like the present when training is performed on small datasets. This enabled us to gauge the reliability of the predictive models when applied to molecules away from the chemical space represented in the training set. This machinery was then used in conjunction with generative models and reinforcement learning (RL) in order to explore chemical space for optimal binders; the generative models were trained separately on subsets of large screening libraries. The molecules were filtered by synthesizability, physicochemical properties and using our transition state similarity metric. Finally, the molecules (or close analogues) were selected from the REAL library before being clustered (80 clusters, some have the same/similar cores) and representatives selected (20 compounds) for testing.