CN1CCCc2ccc(S(=O)(=O)C(=O)N3Cc4ccccc4C(c4ccccc4)C3)cc21
O=C(CCl)N1CC2C(=CC=C2C2CN(C(=O)CCl)Cc3ccccc32)C(c2ccccc2)C1
O=C(c1cccc2ccccc12)N1Cc2ccccc2C(c2ccccc2)C1
O=C(C(=O)N1Cc2ccccc2C(c2ccccc2)C1)N1CCN(S(=O)(=O)c2ccc(Cl)cc2)CC1
Cc1ccc(NC(=O)CN2Cc3ccccc3C(c3ccccc3)C2)cc1N1CCCC1=O
NS(=O)(=O)c1ccc2c(c1)N(CC(=O)N1Cc3ccccc3C(c3ccccc3)C1)CCC2
CC1CN(C(=O)N2Cc3ccccc3C(c3ccccc3)C2)CCO1
O=C(CN1Cc2ccccc2C(c2ccccc2)C1)N1Cc2ccccc2C(c2ccccc2)C1
O=C(Nc1ccccc1)N1CCC(N2Cc3ccccc3C(c3ccccc3)C2)CC1
Cc1nc(-c2ccccc2)c(NC(=O)C(=O)N2CCC(C(=O)Nc3ccccc3O)CC2)s1
O=C(Nc1ccccc1O)C1CCN(C(=O)CCC(=O)N2Cc3ccccc3C(c3ccccc3)C2)CC1
NS(=O)(=O)c1ccc(C(=O)N2Cc3ccccc3C(c3ccccc3)C2)cc1
Cc1cc(N2Cc3ccccc3C(c3ccccc3)C2)ccc1CS(N)(=O)=O
Cc1ccc(C)c(S(=O)(=O)N2CCN(C(=O)N3CCCC(c4nc5ccccc5s4)C3)CC2)c1
c1ccc(C2CN(c3ccc(Oc4ncccn4)cc3)Cc3ccccc32)cc1
O=C(Nc1ccccc1O)C1CCN(N2Cc3ccccc3C(c3ccccc3)C2)CC1
CS(=O)(=O)Nc1cccc(C(=O)CC(=O)N(c2ccc(F)cc2)C2C=CS(=O)(=O)C2)c1
CC(=O)NCCc1c[nH]c2ccc(CC(=O)N3CCN(S(=O)(=O)c4cc(C)ccc4C)CC3)cc12
O=C(CC(=O)N1CCC(C(=O)N2CCCCC2)CC1)N1CCC(C(=O)N2CCCCC2)CC1
I adapted Jan Jensen's graph-based genetic-algorithm (GB-GA) code to take the 92 fragments from Diamond as the population input, and then carry out ~100'000 cross-over and mutations against this population. The generated children SMILES each have two parents. These SMILES were then used to generated a 3d conformer (with `smi23d`) which was refined with the MM94FF in `obminimize`. The resulting candidate molecule was then docked with Autodock Vina, against 6YB7 selecting near the catalytic site . From these 100'000 data points, the 20 highest scoring docked compounds were selected for submission. No additional human or algorithmic filtering has been done. (Note the double sulphone warhead in one of the molecules!) I chose this method due to bitter experience in molecular design with both computational and machine learning approaches. Due to the exponentially large chemical configuration space, any attempt to optimise a metric with errors (such as docking) will lead to the selection of the much more numerous false-positives than a genuine hit. By constraining the design algorithm to mixing two known-valid molecules (the fragments from Diamond), you strongly bias the exploration of chemical space to plausible regions, while giving sufficient freedom for the algorithm to select arbitrary combinations that would not be considered by a human. As the structures are two fused fragments, they are approaching the size of a drug-like molecule. All of these structures docked (Vina) at -9.0 -- -9.7 kcal/mol . For all 200 top scoring structures and docked `pdbqt` files, see: https://github.com/QuantumCorona/AutodockVina_3CL-pro/tree/master/0009_hpc_fragment_outputs
I'm sorry, but I did not retain information about which two fragments are the compounds parents. Probably this can be done by eye if there's sufficient interest in a given molecule. For future rounds, I'll add some code to the GA step that tracks this information. This worked benefited enormously from Jan Jensen's open-source GB-GA python code, and compute resources from Imperial College London's Research Computing Service. I trained as a physicist, so lack the skills to 'eyeball' these predictions.