Submission Details

Molecule(s):
COC(=O)c1ccc(S(=O)(=O)NCc2ccc(C(C)NC(=O)OC(C)(C)C)cc2)c(Br)c1

VIJ-CYC-1a381570-1

COC(=O)c1ccc(S(=O)(=O)NCc2ccc(C(C)NC(=O)OC(C)(C)C)cc2)c(Br)c1

COCc1ncc(NS(=O)(=O)c2ccc(C3(C#N)CC3)cc2)c(C)n1

VIJ-CYC-1a381570-2

COCc1ncc(NS(=O)(=O)c2ccc(C3(C#N)CC3)cc2)c(C)n1

CNC(=O)c1ccc(CS(=O)(=O)NCC(F)(F)c2ccc(F)cc2F)cc1

VIJ-CYC-1a381570-3

CNC(=O)c1ccc(CS(=O)(=O)NCC(F)(F)c2ccc(F)cc2F)cc1

COc1ccccc1C1(CNC(=O)C(=O)NCc2ccc(-c3noc(C)n3)cc2)CCC1

VIJ-CYC-1a381570-4

COc1ccccc1C1(CNC(=O)C(=O)NCc2ccc(-c3noc(C)n3)cc2)CCC1

C=C1CC1C(=O)NC(C)c1ccc(S(=O)(=O)NC)cc1

VIJ-CYC-1a381570-5

C=C1CC1C(=O)NC(C)c1ccc(S(=O)(=O)NC)cc1

CCc1ccc(-c2noc(C(C)NC(=O)C(=O)NC(C)c3ccc(S(=O)(=O)NC)cc3)n2)cc1

VIJ-CYC-1a381570-6

CCc1ccc(-c2noc(C(C)NC(=O)C(=O)NC(C)c3ccc(S(=O)(=O)NC)cc3)n2)cc1

Cc1nc(-c2ccc(CNC(=O)C(=O)NCc3ccc(S(=O)(=O)N(C)C(C)C)cc3)cc2)no1

VIJ-CYC-1a381570-7

Cc1nc(-c2ccc(CNC(=O)C(=O)NCc3ccc(S(=O)(=O)N(C)C(C)C)cc3)cc2)no1

CNS(=O)(=O)c1ccc(C(C)NC(=O)C(=O)NC2CCC(NS(C)(=O)=O)CC2)cc1

VIJ-CYC-1a381570-8

CNS(=O)(=O)c1ccc(C(C)NC(=O)C(=O)NC2CCC(NS(C)(=O)=O)CC2)cc1

N#CC1(c2ccc(S(=O)(=O)NCCOc3c[nH]nc3C(F)(F)F)cc2)CC1

VIJ-CYC-1a381570-9

N#CC1(c2ccc(S(=O)(=O)NCCOc3c[nH]nc3C(F)(F)F)cc2)CC1

C=CC(=O)NCc1ccc(C(C)NC(=O)OC(C)(C)C)cc1

VIJ-CYC-1a381570-10

C=CC(=O)NCc1ccc(C(C)NC(=O)OC(C)(C)C)cc1

CNS(=O)(=O)c1ccc(C(C)NC(=O)NC(C)(C)C)cc1

VIJ-CYC-1a381570-11

CNS(=O)(=O)c1ccc(C(C)NC(=O)NC(C)(C)C)cc1

CC(C)[C@@H](C#N)NS(=O)(=O)c1ccc(CF)cc1

VIJ-CYC-1a381570-12

CC(C)[C@@H](C#N)NS(=O)(=O)c1ccc(CF)cc1

COC(=O)c1sc(NS(=O)(=O)c2ccc(CF)cc2)nc1C(C)(C)C

VIJ-CYC-1a381570-13

COC(=O)c1sc(NS(=O)(=O)c2ccc(CF)cc2)nc1C(C)(C)C


Design Rationale:

We wanted to be informed by the set of fragments, but not fully reliant on matching fragments exactly, to drive our compound exploration. To enable this, we used Cyclica’s Ligand Design™ (LD), a flexible machine learning for identifying molecules that best satisfy a defined objective function. Briefly, LD traverses a chemical space iteratively by repeatedly deriving new molecules (children) from previous selections (parents) and then filtering them with respect to one or more objective functions. This optimization process proceeds until convergence, when no better molecules can be found. Ligand Design has multiple options for traversing chemical space, including fixed library screening (10^6 molecules evaluated), rule-based semi-generative approaches (~10^10 molecules represented), and fully generative options (~10^120 molecule possibilities). Importantly, for this design challenge, we used the semi-generative option for LD (that was developed with support from our Partners at Enamine), which efficiently traverses the Enamine REAL Space of molecules to identify promising molecules with high synthetic accessibility. For this design challenge, we applied two major selective pressures (objective functions). The first optimizes for compatibility with the 3CLpro binding site using Cyclica’s MatchMaker Deep Learning proteome-screening engine (white paper here: https://tinyurl.com/wx3yg64). MatchMaker has been found to be far more accurate than molecular docking, and is also several orders of magnitude more computationally efficient. The second is a ligand-based model consisting of all active fragments coming from the XChem screen. This model was created using our POEM technology (https://arxiv.org/abs/2002.04555), a parameter-free supervised learning algorithm that utilizes multiple fingerprints simultaneously for improved predictive performance. Incorporation of this POEM model pressures LD to evolve molecules which resemble the fragment set (and prioritizes the molecules with higher occupancy in the crystal structures). Additional filters ensure that all molecules evaluated have ‘drug-like’ physicochemical properties and exclude problematic functional groups. Cyclica’s Ligand Design was applied using default parameters, allowing the optimization to proceed until convergence. The resulting process automatically defined a total of 106 ‘top molecules’ on the basis of proteome screening ranks. Among these top molecules, 35 ranked the corona virus protease compatibility higher than any human protein (~8000 present in the screen). A semi-manual curation process was applied to the 106 to prioritize molecules for submission. The first three were selected on the basis of having substructures with exact matches to hits from the XChem fragment screen. Next, we clustered the remaining top molecules using a multiple fingerprint approach and selected the 10 cluster exemplars for a total of 13 submissions. Any positive hits among cluster exemplars would merit subsequent investigation of the remaining cluster members. Note, For this submission Cyclica’s Ligand Design was used in its default state with no algorithmic customization or further optimizations to strictly enforce the presence of select fragments. We are open to further customizing this LD design process in subsequent design iterations, in order to incorporate the the expert insights of medicinal chemists involved in this project.

Other Notes:

3 of the 13 molecules submitted contain fragments as substructures. Importantly, the remaining 10 molecules, as described in the Design Rationale, were generated using a selective pressure using a model based on active fragments

Inspired By:
Discussion: