2023.04. 17 TDC
0.4.0 is released! We're excited to announce the release of a new interface
tdc_hf_interface that allows users to easily access and leverage pre-trained models hosted at HuggingFace on TDC datasets and tasks. In this first batch, we've released nine pre-trained models from DeepPurpose that cover three popular ADMET datasets in the Commons. To load our pre-trainend model, simply do the following:
from tdc import tdc_hf_interface tdc_hf = tdc_hf_interface("BBB_Martins-AttentiveFP") dp_model = tdc_hf.load_deeppurpose('./data') tdc_hf.predict_deeppurpose(dp_model, ['CC(=O)NC1=CC=C(O)C=C1'])
The TDC-HF space is located at here. Stay tuned for more exciting pre-trained models, tasks & demos!
0.3.9 is released! Here are the changes:
- TDC has 9 new datasets on high throughput screening
HTS. These assays cover a wide range of protein target classes and are carefully collated through confirmation screens to validate active compounds. See here on how to access them!
|Protein Target Class||PubChem AID||Protein Target||Total # of Molecules||# of Active Molecules|
|GPCR||1798||M1 Muscarinic Receptor Agonists||61,833||187|
|GPCR||435034||M1 Muscarinic Receptor Antagonists||61,756||362|
|Ion Channel||1843||Potassium Ion Channel Kir2.1||301,493||172|
|Ion Channel||2258||KCNQ2 Potassium Channel||302,405||213|
|Ion Channel||463087||Cav3 T-type Calcium Channels||100,875||703|
|Kinase||2689||Serine/Threonine Kinase 33||319,792||172|
- TDC has an additional dataset on hERG in the
Toxtask. See here for more info!
- TDC now follows black code style!
0.3.8 is released! Here are the changes:
- TDC has a new task on structure-based drug design
SBDDwith four datasets PDBBind, DUD-E, scPDB. See here on how to access them!
- To support evaluation of SBDD tasks, we also include two evaluation metrics (RMSD, Kabsch-RMSD) that compare distances between two structures. See here for more info.
- TDC has a new dataset on PAMPA (parallel artificial membrane permeability assay), which is a commonly employed assay to evaluate drug permeability across the cellular membrane in the
ADMEtask. See here for more info!
0.3.7 is released! Here are the changes:
- TDC has a new evaluation metric on logAUC. See here and the PR.
- TDC now supports graphein protein 3D representation for antibody develop-ability prediction. See tutorial and the PR.
QMtask are now in 3D format. See here.
- TDC has a harmonize function to deal with duplicated experimental entries in DTI. See here.
- TDC now has a dataloader for PrimeKG as an auxilliary resource. See how to access PrimeKG here.
- TDC fixed static scikit-learn version issue for gsk3b, jnk3, drd2 oracles. See here for more info.
- The PPBR dataset in ADME task now has additional species information and the default is now only containing homo sapiens while you can retrieve other species via a TDC function. See here for more info.
0.3.6 is released! TDC has a new task on TCR-Epitope Binding prediction (Thanks to Anna and Jannis)! Checkout here for more information.
0.3.5 is released! Here are the changes:
- TDC has an updated ChEMBL library (Version 29) in
MolGen! The previous version is also still kept available. Checkout here for more information.
- Reaction type information can be found within split by turning on the include_reaction_type flag for USPTO-50 in
RetroSyn! Checkout here for more information.
- Fixed bug on cold split for higher order (>2) multi-instance prediction tasks! (Thanks to Zoe !) Checkout here for more information.
0.3.4 is released! Bug fixes on docking oracles and KL divergence measure.
0.3.3 is released! Now added extended support for cold split in multi prediction tasks, see this issue!
0.3.2 is released! We have added support for harmonizing same DTIs with different affinities (KIBA, DAVIS Updated accordingly, see this issue); support for label name retrieval for TWOSIDES (this issue), and add gene symbol info to GDSC (this issue).
0.3.0 is released! We have greatly restructured the code to be contributor friendly while keeping most interfaces the same. We also release the documentation for TDC package at here.
2021.05.30 TDC updates to
0.2.0, major changes:
- TDC has a new molecule generation benchmark on docking scores! Checkout here for more information.
2021.03.24 TDC updates to
0.1.9, major changes:
- TDC now supports molecule filters! Checkout here for more information.
2021.03.17 TDC updates to
0.1.8, major changes:
- Leaderboard is reformulated and we invite submission for each individual benchmark! Checkout here for more information.
2021.02.26 TDC updates to
0.1.7, major changes:
- Streamlined leaderboard programming framework! Checkout here for more information.
- Label log transformation supported. Checkout here for more information.
2021.02.18 TDC just released the white paper in arXiv! Here is the link to the paper.
2021.02.04 TDC updates to
0.1.6, major changes:
- New Leaderboard! Just released the second leaderboard on drug combination response prediction! Checkout here for usage.
2021.01.16 TDC updates to
0.1.5, major changes:
- New Oracles! Added four realistic oracles from docking scores and synthetic accessibility scores! Checkout here for usage.
2021.01.09 TDC updates to
0.1.4, major changes:
- New Function! Added a data processing helper to map among ~15 molecular formats in 2 lines of code (For 2D: from SMILES/SEFLIES and convert to SELFIES/SMILES, Graph2D, PyG, DGL, ECFP2-6, MACCS, Daylight, RDKit2D, Morgan, PubChem; For 3D: from XYZ, SDF files to Graph3D, Columb Matrix). Checkout here for usage.
- Quality Check! Canonicalize SMILES on DTI datasets with Drug, Target IDs added. Checkout
2020.12.30 TDC updates to
0.1.3, major changes:
- New Dataset! Added a new therapeutic task CRISPR Repair Outcome Prediction! Checkout
- New Function! Added a data processing helper to map SMILES string to popular cheminformatics fingerprints (ECFP2, ECFP4, ECFP6, MACCS, Daylight-type, RDKit2D, Morgan, Pubchem)! Checkout here for usage.
2020.12.24 TDC updates to
0.1.2, major changes:
- Leaderboard Release! TDC's first leaderboard on ADMET prediction is released. You can find the leaderboard guide here, where we provide a
BenchmarkGroupclass to do model building on leaderboard tasks rapidly. The ADMET leaderboard is here.
2020.12.19 TDC updates to
0.1.1, major changes:
- Quality Check and New datasets! We replaced VD, Half Life and Clearance datasets in
ADMEfrom new sources that have higher qualities. We also added LD50 to
2020.12.15 TDC updates to
0.1.0, major changes:
- Five New Datasets! Added CYP2C9/2D6/3A4 Substrate, for
ADME, Carcinogens for
Toxand NCI-60 for
- Quality Check. We conducted a canonicalization of all SMILES and removed ones that return errors in the
2020.11.30 TDC updates to
0.0.8, major changes:
- Five New Datasets! Added hREG, DILI (Drug Induced Liver Injury), Skin Reaction, Ames Mutagenicity for
Toxand PPBR from AstraZeneca for
- Distribution Learning Metrics Moved to Evaluators. Checkout here for the updated usage.
- Meta Oracles. We included a helper function where you can specify your own set of molecules for Rediscovery, Similarity, Medians, Isomers. Checkout an example usage here.
- Tutorials. We have provided various tutorials for you to start using TDC. Click here .