CCC1N=C[C@@H](NC(=O)NC(C)C)C2=NN3C=C(Cl)[C@@H](n4ccc(C(C)=O)n4)C3=NN21
CCNC(=O)N[C@@H]1C(C)=CN2N=C3[C@H](c4nnc(=O)[nH]n4)NC=C(C)N3N=C12
CC(C)NC(=O)N[C@@H]1N=CCN2N=C3[C@H](NC(C)C(N)=O)N=CCN3N=C12
These were generated by a GB-GA in a similar fashion to https://discuss.postera.ai/t/submission-jar-imp-b007c7c2/1298 . The reference molecule was Boceprevir. This the result from an overnight run of 40'000 streams of the GA each running 25 generations with a population of 100. The GAs start with a set of all the Diamond fragments, and a thousand molecules from ZINC. In two tweaks the code, this run initialised on the fly from the elite (top scoring) of previous GA runs (effectively chaining together otherwise parallel runs), and a tweak was made to the tournament scoring to increase diversity. (Overall, 100 million molecules were generated and scored.) These submitted molecules were hand-picked from the top few hundred scoring molecules overall, as they had these unusual fused macrocycles which the scoring algorithm believes is similar to the middle of Boceprevir, and then with alternatives for the peptide like sidechains. Most high scoring molecules were oligomers of heterocycles. Each molecule was slightly cleaned by hand (generally heteroatom -> Carbon, or halogen -> hydrogen), to remove reactive groups. This work was done in collaboration with Kuano Ltd, and used computer time on the Imperial College Research Computing Service, DOI: 10.14469/hpc/2232.