Docking Scores
Description: Docking is a theoretical evaluation of affinity (free energy change of the binding process) between a ligand (a small molecule) and a target (a protein involved in a disease pathway). A docking evaluation usually includes conformational sampling of ligand and free energy change calculation. A molecule with higher affinity usually has a higher potential to poses higher bioactivity.
De novo molecular generation has been focusing on simple heuristic oracles, such as QED, LogP. Those oracles are either too easy to optimize or can produce unrealistic molecules. This is aptly summarized in Coley et al. (2019) [1] as: “The current evaluations for generative models do not reflect the complexity of real discovery problems.” Recent work by Cieplinski et al. (2020) [2] also titled: "We Should At Least Be Able To Design Molecules That Dock Well." Thus, we decided to include a meta oracle of the molecular docking method. We adopted the python wrapper from pyscreener [3] that allows easy access to various docking software, including vina, smina, qvina2, psovina and DOCK6. Users can specify the target based on their own interests while providing several typical oracle functions for the leaderboard.
Important Note: if you use the docking score oracle function, please cite Graff et al. "Accelerating high-throughput virtual screening through molecular pool-based active learning."!
Installation instruction:
- 0. Setup TDC conda env:
conda create -n tdc_env python=3.7
conda activate tdc_env
conda install -c conda-forge pytdc
- 1. Install PyScreener following https://github.com/coleygroup/pyscreener#installation
- 2. Install external docking software as you need, we use AutoDock Vina as an example:
- 1) Install ADFR. The detailed instruction for different OS can be found in https://ccsb.scripps.edu/adfr/downloads/.
- 2) Install Vina. The detailed instruction can be found in http://vina.scripps.edu/.
- 3) Add the ADFR/Vina bin path to the PATH variable in your .bashrc / .bash_profile. For example,
# in .bashrc or .bash_profile
export PATH=$PATH:XXXX/autodock_vina_1_1_2_linux_x86/bin/
export PATH=$PATH:XXXX/ADFRsuite-1.0/bin/
# please make sure the path is on the bin folder
Please open an issue if you meet any problem. Checkout this GitHub issue for FAQ.
from tdc import Oracle
# 1. If you want to use the pdb file we prepared, directly use "{PDB_ID}_docking" as naming tag. Here is the list of prepared PDBs: https://github.com/mims-harvard/TDC/blob/428f7905374a4dfc6a3cb50cf8653be99afcd56f/tdc/metadata.py#L327
oracle = Oracle(name = '3pbl_docking')
oracle('c1ccccc1')
# Docking: 100%|██████████| 1/1 [00:02<00:00, 1.22s/ligand]
# -4.1
# 2. One can also specify the target and binding pocket with a cleaned PDB ID and a box coordinates
oracle2 = Oracle(name = 'pyscreener', receptor_pdb_file='./oracle/'+pdbid+'.pdb', box_center = center, box_size = boxsize)
References:
ASKCOS
Description: Gao and Coley [1] have demonstrated that surrogate scoring models cannot sufficiently determine the easiness to obtain a chemical, and therefore, in addition to the SA oracle, we provide a score generated by full retrosynthetic pathway analysis. TDC included interfaces for multiple types of retrosynthetic pathway analysis as oracles and provided flexible access to various results. ASKCOS (https://askcos.mit.edu) is the open-source software framework used in [1] that integrates efforts to generalize known chemistry to new substrates by learning to apply retrosynthetic transformations, to identify suitable reaction conditions, and to evaluate whether reactions are likely to be successful. The data-driven models are trained with USPTO and Reaxys databases.
Installation instruction: Users can first deploy ASKCOS on their server following their instructions (https://github.com/connorcoley/ASKCOS), and access the server with our oracle function. One can also use cloud resources like Google Cloud Platform, which is recommended by the authors. Note that it may take 5-10 minutes after deployment for the retro transformer workers to start up. One can check the status of their startup by looking at "server status". The whole deployment process on a Google Cloud virtual machine should take about 20 minutes. issue
For the sake of handiness of TDC and IP of the retrosynthetic analysis software, we utilize the API access of those software and require additional input to the oracle function.
from tdc import Oracle
askcos = Oracle(name = 'ASKCOS')
smiles = 'CCOCCOCC'
host_ip = 'http://xx.xx.xxx.xxx'
askcos(smiles, host_ip, output='plausibility')
# 0.942
askcos(smiles, host_ip, output='num_step')
# 3
'''
You can alsospecify all the parameters of retrosnythetic analysis by from the function:
askcos(smiles, host_ip, output='plausibility', save_json=False, file_name='tree_builder_result.json', num_trials=5,
max_depth=9, max_branching=25, expansion_time=60, max_ppg=100, template_count=1000, max_cum_prob=0.999,
chemical_property_logic='none', max_chemprop_c=0, max_chemprop_n=0, max_chemprop_o=0, max_chemprop_h=0,
chemical_popularity_logic='none', min_chempop_reactants=5, min_chempop_products=5, filter_threshold=0.1, return_first='true')
'''
References:
Molecule.one
Description: Molecule.one API estimates the synthetic accessibility of a molecule based on a number of factors including the number of steps in the predicted synthesis plan and the cost of the starting materials. Currently, the API token can be requested from the Molecule.one website and is provided on a one-to-one basis for research use. We are working with Molecule.one on providing a more open access in the near-term future.
Installation instruction:
- Create an account at tdc.molecule.one. Grab the API Token in your profile page.
- Install molecule.one by
pip install git+https://github.com/molecule-one/m1wrapper-python
Important Note: The use of Molecule.one software provided at tdc.molecule.one is permitted only for non-commercial use in connection with evaluation on the datasets provided by the TDC benchmark. To evaluate your molecules you need to register an account on tdc.molecule.one and accept provided terms and conditions. After registering the account, please write to stan@molecule.one a request to activate it with a brief description of the intended use.
from tdc import Oracle
m1 = Oracle(name = 'Molecule One Synthesis', api_token = 'XXXXX')
smiles = ['[H][C@@]12OC3=C(O)C=CC4=C3[C@@]11CCN(C)[C@]([H])(C4)[C@]1([H])C=C[C@@H]2O',
'CC(=O)NC1=CC=C(O)C=C1']
m1(smiles)
'''
{'[H][C@@]12OC3=C(O)C=CC4=C3[C@@]11CCN(C)[C@]([H])(C4)[C@]1([H])C=C[C@@H]2O': '10.000',
'CC(=O)NC1=CC=C(O)C=C1': '1.1693'}
'''
References:
IBM RXN Synthetic Accessibility
Description: IBM RXN (https://rxn.res.ibm.com) is an AI platform integrating forward reaction prediction and retrosynthetic analysis. The backend of the IBM RXN retrosynthetic analysis is the Molecular Transformer model[1]. The model was mainly trained with USPTO, Pistachio databases. For the sake of handiness of TDC and IP of the retrosynthetic analysis software, we utilize the API access of those software and require additional input to the oracle function.
- Create an account at this link. Grab the API Token in your profile page.
- Install IBM RXN by
pip install rxn4chemistry
from tdc import Oracle
oracle = Oracle(name = 'IBM_RXN')
smiles = 'CCOCCOCC'
key = 'apk-c9db......' # You can obtain a key from https://rxn.res.ibm.com
oracle(smiles, key)
# 0.983
oracle(smiles, key, output='result')
# {'retrosynthetic_paths': [{'id': '5fb1c4a98937a9000127a345',
# 'metadata': {},
# 'embed': {},
# 'computedFields': {},
# 'createdOn': 1605485737424,
# 'createdBy': 'system',
# 'modifiedOn': 1605485737424,
# 'modifiedBy': 'system',
# 'moleculeId': '5fb1c2078937a90001279fa0',
# 'retrosynthesisId': '5fb1c4a48937a9000127a336',
# 'sequenceId': '5fb1c4a98937a9000127a340',
# 'projectId': '5fb1c4868937a9000127a320',
# 'smiles': 'CCOCCOCC',
# 'confidence': 0.983,
# ......
References:
Glycogen Synthase Kinase 3 Beta (GSK3β)
Description: Glycogen synthase kinase 3 beta, also known as GSK3β, is an enzyme that in humans is encoded by the GSK3β gene. Abnormal regulation and expression of GSK3β is associated with an increased susceptibility towards bipolar disorder. The oracle is a random forest classifer using ECFP6 fingerprints using ExCAPE-DB dataset.
from tdc import Oracle
oracle = Oracle(name = 'GSK3B')
oracle(['CC(C)(C)[C@H]1CCc2c(sc(NC(=O)COc3ccc(Cl)cc3)c2C(N)=O)C1', \
'CCNC(=O)c1ccc(NC(=O)N2CC[C@H](C)[C@H](O)C2)c(C)c1', \
'C[C@@H]1CCN(C(=O)CCCc2ccccc2)C[C@@H]1O'])
# [0.03, 0.0, 0.0]
References:
c-Jun N-terminal Kinases-3 (JNK3)
Description: c-Jun N-terminal Kinases-3 (JNK3) belongs to the mitogen-activated protein kinase family, and are responsive to stress stimuli, such as cytokines, ultraviolet irradiation, heat shock, and osmotic shock. The oracle is a random forest classifer using ECFP6 fingerprints using ExCAPE-DB dataset.
from tdc import Oracle
oracle = Oracle(name = 'JNK3')
oracle(['CC(C)(C)[C@H]1CCc2c(sc(NC(=O)COc3ccc(Cl)cc3)c2C(N)=O)C1', \
'CCNC(=O)c1ccc(NC(=O)N2CC[C@H](C)[C@H](O)C2)c(C)c1', \
'C[C@@H]1CCN(C(=O)CCCc2ccccc2)C[C@@H]1O'])
# [0.01, 0.0, 0.01]
References:
Dopamine Receptor D2 (DRD2)
Description: DRD2 stands for dopamine type 2 receptor. The oracle is constructed by Olivercrona et al., using a support vector machine classifier with a Gaussian kernel with ECFP6 fingerprint on ExCAPE-DB dataset.
from tdc import Oracle
oracle = Oracle(name = 'DRD2')
oracle(['CC(C)(C)[C@H]1CCc2c(sc(NC(=O)COc3ccc(Cl)cc3)c2C(N)=O)C1', \
'CCNC(=O)c1ccc(NC(=O)N2CC[C@H](C)[C@H](O)C2)c(C)c1', \
'C[C@@H]1CCN(C(=O)CCCc2ccccc2)C[C@@H]1O'])
# [0.0015465365340340924, 0.0023541754878916416, 0.004715407010872501]
References:
Synthetic Accessibility (SA)
Description: Synthetic Accessibility Score stands for how hard or how easy it is to synthesize a given molecule, based on a combination of the molecule’s fragments contributions. The oracle is caluated via RDKit, using a set of chemical rules defined by Ertl et al.
from tdc import Oracle
oracle = Oracle(name = 'SA')
oracle(['CC(C)(C)[C@H]1CCc2c(sc(NC(=O)COc3ccc(Cl)cc3)c2C(N)=O)C1', \
'CCNC(=O)c1ccc(NC(=O)N2CC[C@H](C)[C@H](O)C2)c(C)c1', \
'C[C@@H]1CCN(C(=O)CCCc2ccccc2)C[C@@H]1O'])
# [2.706977149048555, 2.8548373344538067, 2.659973244931228]
References:
Quantitative Estimate of Drug-likeness (QED)
Description: QED stands for Quantitative Estimate of Drug-likeness. The oracle is caluated via RDKit, using a set of chemical rules about drug-likeliness defined by Bickerton et al.
from tdc import Oracle
oracle = Oracle(name = 'QED')
oracle(['CC(C)(C)[C@H]1CCc2c(sc(NC(=O)COc3ccc(Cl)cc3)c2C(N)=O)C1', \
'CCNC(=O)c1ccc(NC(=O)N2CC[C@H](C)[C@H](O)C2)c(C)c1', \
'C[C@@H]1CCN(C(=O)CCCc2ccccc2)C[C@@H]1O'])
# [0.7369335974098526, 0.7965866720151891, 0.9026967965647689]
References:
Octanol-water Partition Coefficient (LogP)
Description: The penalized logP score measures the solubility and synthetic accessibility of a compound. The oracle is caluated via RDKit.
from tdc import Oracle
oracle = Oracle(name = 'LogP')
oracle(['CC(C)(C)[C@H]1CCc2c(sc(NC(=O)COc3ccc(Cl)cc3)c2C(N)=O)C1', \
'CCNC(=O)c1ccc(NC(=O)N2CC[C@H](C)[C@H](O)C2)c(C)c1', \
'C[C@@H]1CCN(C(=O)CCCc2ccccc2)C[C@@H]1O'])
# [2.126496327138913, 0.073949389117486, 0.48850176431612924]
References:
Rediscovery
Description: This oracle aims to rediscover the target molecule Celecoxib, Troglitazone, and Thiothixene. Specifically, it aims for the generated molecule to have high tanimoto similarity with Celecoxib. From Guacamol Benchmark.
from tdc import Oracle
oracle = Oracle(name = 'Rediscovery')
oracle(['CC(C)(C)[C@H]1CCc2c(sc(NC(=O)COc3ccc(Cl)cc3)c2C(N)=O)C1', \
'CCNC(=O)c1ccc(NC(=O)N2CC[C@H](C)[C@H](O)C2)c(C)c1', \
'C[C@@H]1CCN(C(=O)CCCc2ccccc2)C[C@@H]1O'])
# {'Celecoxib': [0.14728682170542637, 0.11666666666666667, 0.09649122807017543], 'Troglitazone': [0.24427480916030533, 0.14615384615384616, 0.12903225806451613], 'Thiothixene': [0.17391304347826086, 0.15625, 0.17796610169491525]}
Note: You can also access individual oracle in the set. For example,
from tdc import Oracle
oracle = Oracle(name = 'Celecoxib_Rediscovery')
oracle(['CC(C)(C)[C@H]1CCc2c(sc(NC(=O)COc3ccc(Cl)cc3)c2C(N)=O)C1', \
'CCNC(=O)c1ccc(NC(=O)N2CC[C@H](C)[C@H](O)C2)c(C)c1', \
'C[C@@H]1CCN(C(=O)CCCc2ccccc2)C[C@@H]1O'])
# [0.14728682170542637, 0.11666666666666667, 0.09649122807017543]
TDC also provides an oracle that takes any SMILES string that users want to rediscover. For example,
from tdc import Oracle
oracle = Oracle(name = 'Rediscovery_Meta', target_smiles = 'CC(=O)OC1=CC=CC=C1C(=O)O')
oracle(['CC(C)(C)[C@H]1CCc2c(sc(NC(=O)COc3ccc(Cl)cc3)c2C(N)=O)C1', \
'CCNC(=O)c1ccc(NC(=O)N2CC[C@H](C)[C@H](O)C2)c(C)c1', \
'C[C@@H]1CCN(C(=O)CCCc2ccccc2)C[C@@H]1O'])
# [0.16666666666666666, 0.18072289156626506, 0.2191780821917808]
References:
Similarity/Dissimilarity
Description: This oracle aims to generate molecules similar/dissimilar to Aripiprazole/Albuterol/Mestranol. Note that these molecules should be removed from the training set. From Guacamol Benchmark.
from tdc import Oracle
oracle = Oracle(name = 'Similarity')
oracle(['CC(C)(C)[C@H]1CCc2c(sc(NC(=O)COc3ccc(Cl)cc3)c2C(N)=O)C1', \
'CCNC(=O)c1ccc(NC(=O)N2CC[C@H](C)[C@H](O)C2)c(C)c1', \
'C[C@@H]1CCN(C(=O)CCCc2ccccc2)C[C@@H]1O'])
# {'Aripiprazole': [0.5356125356125356, 0.3908045977011494, 0.39143730886850153], 'Albuterol': [0.2772277227722772, 0.38095238095238093, 0.3589743589743589], 'Mestranol': [0.19460880999342536, 0.2567901234567901, 0.2612872238232469]}
Note: You can also access individual oracle in the set. For example,
from tdc import Oracle
oracle = Oracle(name = 'Aripiprazole_Similarity')
oracle(['CC(C)(C)[C@H]1CCc2c(sc(NC(=O)COc3ccc(Cl)cc3)c2C(N)=O)C1', \
'CCNC(=O)c1ccc(NC(=O)N2CC[C@H](C)[C@H](O)C2)c(C)c1', \
'C[C@@H]1CCN(C(=O)CCCc2ccccc2)C[C@@H]1O'])
# [0.5356125356125356, 0.3908045977011494, 0.39143730886850153]
TDC also provides an oracle that takes any SMILES string that users want to be similar/dissimilar with. For example,
from tdc import Oracle
oracle = Oracle(name = 'Similarity_Meta', target_smiles = 'CC(=O)OC1=CC=CC=C1C(=O)O')
oracle(['CC(C)(C)[C@H]1CCc2c(sc(NC(=O)COc3ccc(Cl)cc3)c2C(N)=O)C1', \
'CCNC(=O)c1ccc(NC(=O)N2CC[C@H](C)[C@H](O)C2)c(C)c1', \
'C[C@@H]1CCN(C(=O)CCCc2ccccc2)C[C@@H]1O'])
# [0.23076923076923078, 0.1951219512195122, 0.2361111111111111]
References:
Median Molecules
Description: This oracle aims to generate molecules that simultaneously maximize similarities with several molecules. From Guacamol Benchmark.
from tdc import Oracle
oracle = Oracle(name = 'Median')
oracle(['CC(C)(C)[C@H]1CCc2c(sc(NC(=O)COc3ccc(Cl)cc3)c2C(N)=O)C1', \
'CCNC(=O)c1ccc(NC(=O)N2CC[C@H](C)[C@H](O)C2)c(C)c1', \
'C[C@@H]1CCN(C(=O)CCCc2ccccc2)C[C@@H]1O'])
# {'Median 1': [0.09722243533981723, 0.14166129393101462, 0.12765694770084507], 'Median 2': [0.12259690287307903, 0.11470387424947118, 0.11491261514365983]}
Note: You can also access individual oracle in the set. For example,
from tdc import Oracle
oracle = Oracle(name = 'Median 1')
oracle(['CC(C)(C)[C@H]1CCc2c(sc(NC(=O)COc3ccc(Cl)cc3)c2C(N)=O)C1', \
'CCNC(=O)c1ccc(NC(=O)N2CC[C@H](C)[C@H](O)C2)c(C)c1', \
'C[C@@H]1CCN(C(=O)CCCc2ccccc2)C[C@@H]1O'])
# [0.09722243533981723, 0.14166129393101462, 0.12765694770084507]
TDC also provides an oracle that takes any two SMILES strings that users want to simultaneously maximize similarities with. For example,
from tdc import Oracle
tadalafil_smiles = 'O=C1N(CC(N2C1CC3=C(C2C4=CC5=C(OCO5)C=C4)NC6=C3C=CC=C6)=O)C'
sildenafil_smiles = 'CCCC1=NN(C2=C1N=C(NC2=O)C3=C(C=CC(=C3)S(=O)(=O)N4CCN(CC4)C)OCC)C'
oracle = Oracle(name = 'Median_Meta', target_smiles = (tadalafil_smiles, sildenafil_smiles))
oracle(['CC(C)(C)[C@H]1CCc2c(sc(NC(=O)COc3ccc(Cl)cc3)c2C(N)=O)C1', \
'CCNC(=O)c1ccc(NC(=O)N2CC[C@H](C)[C@H](O)C2)c(C)c1', \
'C[C@@H]1CCN(C(=O)CCCc2ccccc2)C[C@@H]1O'])
# [0.12259690287307903, 0.11470387424947118, 0.11491261514365983]
References:
Isomers
Description: This oracle aims to generate molecules that correspond to a target molecular formula (e.g., C7H8N2O2). It assess theb flexibility of the model to generate molecules following a simple pattern (which is a priori unknown). From Guacamol Benchmark.
from tdc import Oracle
oracle = Oracle(name = 'Isomers')
oracle(['CC(C)(C)[C@H]1CCc2c(sc(NC(=O)COc3ccc(Cl)cc3)c2C(N)=O)C1', \
'CCNC(=O)c1ccc(NC(=O)N2CC[C@H](C)[C@H](O)C2)c(C)c1', \
'C[C@@H]1CCN(C(=O)CCCc2ccccc2)C[C@@H]1O'])
# {'c7h8n2o2': [7.077155389805107e-22, 9.454886273886542e-18, 3.7105915150029394e-14], 'c9h10n2o2pf2cl': [3.775134544279098e-11, 4.944450501938644e-09, 1.1793585051615319e-07]}
Note: You can also access individual oracle in the set. For example,
from tdc import Oracle
oracle = Oracle(name = 'Isomers_C7H8N2O2')
oracle(['CC(C)(C)[C@H]1CCc2c(sc(NC(=O)COc3ccc(Cl)cc3)c2C(N)=O)C1', \
'CCNC(=O)c1ccc(NC(=O)N2CC[C@H](C)[C@H](O)C2)c(C)c1', \
'C[C@@H]1CCN(C(=O)CCCc2ccccc2)C[C@@H]1O'])
# [7.077155389805107e-22, 9.454886273886542e-18, 3.7105915150029394e-14]
TDC also provides an oracle that takes any SMILES string and then it would transform it to the chemical formula and use that as the comparison. For example,
from tdc import Oracle
oracle = Oracle(name = 'Isomers_Meta', target_smiles = 'CC(=O)OC1=CC=CC=C1C(=O)O')
oracle(['CC(C)(C)[C@H]1CCc2c(sc(NC(=O)COc3ccc(Cl)cc3)c2C(N)=O)C1', \
'CCNC(=O)c1ccc(NC(=O)N2CC[C@H](C)[C@H](O)C2)c(C)c1', \
'C[C@@H]1CCN(C(=O)CCCc2ccccc2)C[C@@H]1O'])
# [4.632351332478028e-57, 5.853717984129625e-37, 3.120771099829009e-32]
References:
Multi-Property Objective (MPO)
Description: This oracle measures multiple physiochemical properpties of known drug. So each drug corresponds to multiple-property objectives. It contains seven drugs (Osimertinib, Fexofenadine, Ranolazine, Perindopril, Amlodipine, Sitagliptin, Zaleplon) where each has various objectives. From Guacamol Benchmark.
from tdc import Oracle
oracle = Oracle(name = 'MPO')
oracle(['CC(C)(C)[C@H]1CCc2c(sc(NC(=O)COc3ccc(Cl)cc3)c2C(N)=O)C1', \
'CCNC(=O)c1ccc(NC(=O)N2CC[C@H](C)[C@H](O)C2)c(C)c1', \
'C[C@@H]1CCN(C(=O)CCCc2ccccc2)C[C@@H]1O'])
# {'Osimertinib': [0.09011742702110873, 0.4083890176872189, 0.0069208742335098465], 'Fexofenadine': [0.4336446174984538, 0.5101327504385935, 0.01074314980818085], 'Ranolazine': [0.29285467466584664, 0.027222138370807142, 0.015384988076712304], 'Perindopril': [0.36023741111440966, 0.1540877417148235, 0.13584848674330968], 'Amlodipine': [0.461083967620704, 0.15454027643871737, 0.15152116723579184], 'Sitagliptin': [0.00562486906491877, 0.008394273324064522, 0.0036371294214424814], 'Zaleplon': [7.752152611462035e-05, 8.370947134491376e-05, 1.3261169904325478e-05]}
Note: You can also access individual oracle in the set. For example,
from tdc import Oracle
oracle = Oracle(name = 'Osimertinib_MPO')
oracle(['CC(C)(C)[C@H]1CCc2c(sc(NC(=O)COc3ccc(Cl)cc3)c2C(N)=O)C1', \
'CCNC(=O)c1ccc(NC(=O)N2CC[C@H](C)[C@H](O)C2)c(C)c1', \
'C[C@@H]1CCN(C(=O)CCCc2ccccc2)C[C@@H]1O'])
# [0.09011742702110873, 0.4083890176872189, 0.0069208742335098465]
References:
Valsartan SMARTS
Description: The valsartan SMARTS benchmark targets molecules containing a SMARTS pattern related to valsartan while being characterized by physicochemical properties corresponding to the sitagliptin molecule. From Guacamol Benchmark.
from tdc import Oracle
oracle = Oracle(name = 'Valsartan_SMARTS')
oracle(['CC(C)(C)[C@H]1CCc2c(sc(NC(=O)COc3ccc(Cl)cc3)c2C(N)=O)C1', \
'CCNC(=O)c1ccc(NC(=O)N2CC[C@H](C)[C@H](O)C2)c(C)c1', \
'C[C@@H]1CCN(C(=O)CCCc2ccccc2)C[C@@H]1O'])
# [0.0, 0.0, 0.0]
References:
Hop
Description: The Scaffold Hop and Decorator Hop benchmarks aim to maximize the similarity to a SMILES string, while keeping or excluding specific SMARTS patterns, mimicking the tasks of changing the scaffold of a compound while keeping specific substituents and keeping a scaffold fixed while changing the substitution pattern. From Guacamol Benchmark.
from tdc import Oracle
oracle = Oracle(name = 'Hop')
oracle(['CC(C)(C)[C@H]1CCc2c(sc(NC(=O)COc3ccc(Cl)cc3)c2C(N)=O)C1', \
'CCNC(=O)c1ccc(NC(=O)N2CC[C@H](C)[C@H](O)C2)c(C)c1', \
'C[C@@H]1CCN(C(=O)CCCc2ccccc2)C[C@@H]1O'])
# {'Deco Hop': [0.5338365434669443, 0.5200860832137733, 0.5038648836670017], 'Scaffold Hop': [0.38446411012782694, 0.36368563685636857, 0.3391736019856913]}
Note: You can also access individual oracle in the set. For example,
from tdc import Oracle
oracle = Oracle(name = 'Scaffold Hop')
oracle(['CC(C)(C)[C@H]1CCc2c(sc(NC(=O)COc3ccc(Cl)cc3)c2C(N)=O)C1', \
'CCNC(=O)c1ccc(NC(=O)N2CC[C@H](C)[C@H](O)C2)c(C)c1', \
'C[C@@H]1CCN(C(=O)CCCc2ccccc2)C[C@@H]1O'])
# [0.38446411012782694, 0.36368563685636857, 0.3391736019856913]
References: