Submission Details

Molecule(s):
CC(c1cccc(F)c1)N1CCN(c2cc(C(F)(F)F)nc(-c3cccnc3)n2)CC1

JAR-KUA-8c13982c-1

CC(c1cccc(F)c1)N1CCN(c2cc(C(F)(F)F)nc(-c3cccnc3)n2)CC1

Cc1nc(-c2ccccc2)cc(N2CCN(S(=O)(=O)c3cccs3)CC2)n1

JAR-KUA-8c13982c-2

Cc1nc(-c2ccccc2)cc(N2CCN(S(=O)(=O)c3cccs3)CC2)n1

Cc1nc(C)c(Br)c(NCCc2cc(F)cc(F)c2)n1

JAR-KUA-8c13982c-3

Cc1nc(C)c(Br)c(NCCc2cc(F)cc(F)c2)n1

COc1ccc(-c2nc(-c3ccc4c(c3)OCO4)[nH]c2-c2ccccc2)cc1

JAR-KUA-8c13982c-4

COc1ccc(-c2nc(-c3ccc4c(c3)OCO4)[nH]c2-c2ccccc2)cc1

COc1ccc(Cl)cc1Nc1nc(-c2cccnc2)nc2ccccc12

JAR-KUA-8c13982c-5

COc1ccc(Cl)cc1Nc1nc(-c2cccnc2)nc2ccccc12

Cc1cc(C)cc(Oc2nc3c(c(=O)n(C)c(=O)n3C)n2Cc2cccc(Br)c2)c1

JAR-KUA-8c13982c-6

Cc1cc(C)cc(Oc2nc3c(c(=O)n(C)c(=O)n3C)n2Cc2cccc(Br)c2)c1

c1cncc(-c2nc(N3CCN(Cc4ccsc4)CC3)c3ccccc3n2)c1

JAR-KUA-8c13982c-7

c1cncc(-c2nc(N3CCN(Cc4ccsc4)CC3)c3ccccc3n2)c1

COc1cc2c(cc1OC)CN(Cc1ccc(C(F)(F)F)cc1)CC2

JAR-KUA-8c13982c-8

COc1cc2c(cc1OC)CN(Cc1ccc(C(F)(F)F)cc1)CC2

COc1ccccc1CN(Cc1cccnc1)Cc1ccc(C(F)(F)F)nc1

JAR-KUA-8c13982c-9

COc1ccccc1CN(Cc1cccnc1)Cc1ccc(C(F)(F)F)nc1

COc1ccccc1N1CCN(C(=O)c2cc(-c3cccnc3)nc3ccccc23)CC1

JAR-KUA-8c13982c-10

COc1ccccc1N1CCN(C(=O)c2cc(-c3cccnc3)nc3ccccc23)CC1

Clc1cccc(C2Oc3ccccc3C3CC(c4cccs4)=NN32)c1

JAR-KUA-8c13982c-11

Clc1cccc(C2Oc3ccccc3C3CC(c4cccs4)=NN32)c1

Cc1cccc(C2CC(c3ccc4c(c3)OCO4)=NN2S(C)(=O)=O)c1

JAR-KUA-8c13982c-12

Cc1cccc(C2CC(c3ccc4c(c3)OCO4)=NN2S(C)(=O)=O)c1

Cc1cc(=O)[nH]c(CN2C[C@@H]3CC[C@H](C2)N3Cc2ccccc2)n1

JAR-KUA-8c13982c-13

Cc1cc(=O)[nH]c(CN2C[C@@H]3CC[C@H](C2)N3Cc2ccccc2)n1

COc1ccc(-n2c(-c3ccccc3)nc3nc4ccccc4nc32)cc1Cl

JAR-KUA-8c13982c-14

COc1ccc(-n2c(-c3ccccc3)nc3nc4ccccc4nc32)cc1Cl

COc1ccc(N2C(=O)C(Oc3ccc(Cl)cc3Cl)C2c2ccc3c(c2)OCO3)cc1

JAR-KUA-8c13982c-15

COc1ccc(N2C(=O)C(Oc3ccc(Cl)cc3Cl)C2c2ccc3c(c2)OCO3)cc1

COc1ccccc1N1CCN(C2CCN(Cc3ccccc3C(F)(F)F)CC2)CC1

JAR-KUA-8c13982c-16

COc1ccccc1N1CCN(C2CCN(Cc3ccccc3C(F)(F)F)CC2)CC1

FC(F)(F)c1cscc1CN1CC2CC(C1)N(Cc1ccccc1)C2

JAR-KUA-8c13982c-17

FC(F)(F)c1cscc1CN1CC2CC(C1)N(Cc1ccccc1)C2

COc1cc(OC)c(OC)cc1CN1CCN(Cc2cc(OC)c3c(c2)OCO3)CC1

JAR-KUA-8c13982c-18

COc1cc(OC)c(OC)cc1CN1CCN(Cc2cc(OC)c3c(c2)OCO3)CC1

COc1cc2c(cc1OC)C(c1cccs1)N(S(N)(=O)=O)CC2

JAR-KUA-8c13982c-19

COc1cc2c(cc1OC)C(c1cccs1)N(S(N)(=O)=O)CC2

COc1ccc(Cl)cc1C(=O)N1CCN(C(C)c2cccc(F)c2)CC1

JAR-KUA-8c13982c-20

COc1ccc(Cl)cc1C(=O)N1CCN(C(C)c2cccc(F)c2)CC1


Design Rationale:

We first trained baseline predictive models from curated inhibition data from the COVID Moonshot and other public sources, and used these to identify an optimal model, optimising over architectures and hyper-parameters. We ultimately converged on a graph-convolutional deep learning model trained on Moonshot data. In order to obtain the best possible generalisation properties the dataset was augmented using negative data (from XChem crystallographic screen) and the model was pre-trained on a dataset representing the area of chemical space we wished to explore for synthesis. In addition to this we built an out of distribution (OOD) classifier to generate an estimate of how reliable the predictive model is when applied to novel areas of chemical space. This was a key step, as it provides a handle on generalisability of deep learning models, which is especially problematic in scenarios like the present when training is performed on small datasets. This enabled us to gauge the reliability of the predictive models when applied to molecules away from the chemical space represented in the training set. This machinery was then used in conjunction with generative models and reinforcement learning (RL) in order to explore chemical space for optimal binders; the generative models were trained separately on subsets of large screening libraries. The molecules were filtered by synthesizability, physicochemical properties and using our transition state similarity metric. Finally, the molecules (or close analogues) were selected from the REAL library before being clustered (80 clusters, some have the same/similar cores) and representatives selected (20 compounds) for testing.

Discussion: