2023.07.10 TDC 0.4.1 is released! TDC has a new exciting task on clinical trial outcome prediction (Thanks to Tianfan)! Checkout here for more information.

2023.04. 17 TDC 0.4.0 is released! We're excited to announce the release of a new interface tdc_hf_interface that allows users to easily access and leverage pre-trained models hosted at HuggingFace on TDC datasets and tasks. In this first batch, we've released nine pre-trained models from DeepPurpose that cover three popular ADMET datasets in the Commons. To load our pre-trainend model, simply do the following:

from tdc import tdc_hf_interface
tdc_hf = tdc_hf_interface("BBB_Martins-AttentiveFP")
dp_model = tdc_hf.load_deeppurpose('./data')
tdc_hf.predict_deeppurpose(dp_model, ['CC(=O)NC1=CC=C(O)C=C1'])

The TDC-HF space is located at here. Stay tuned for more exciting pre-trained models, tasks & demos!

2023.01.26 TDC 0.3.9 is released! Here are the changes:

  • TDC has 9 new datasets on high throughput screening HTS. These assays cover a wide range of protein target classes and are carefully collated through confirmation screens to validate active compounds. See here on how to access them!
Protein Target Class PubChem AID Protein Target Total # of Molecules # of Active Molecules
GPCR 435008 Orexin1 Receptor 218,158 233
GPCR 1798 M1 Muscarinic Receptor Agonists 61,833 187
GPCR 435034 M1 Muscarinic Receptor Antagonists 61,756 362
Ion Channel 1843 Potassium Ion Channel Kir2.1 301,493 172
Ion Channel 2258 KCNQ2 Potassium Channel 302,405 213
Ion Channel 463087 Cav3 T-type Calcium Channels 100,875 703
Transporter 488997 Choline Transporter 302,306 252
Kinase 2689 Serine/Threonine Kinase 33 319,792 172
Enzyme 485290 Tyrosyl-DNA Phosphodiesterase 341,365 281
  • TDC has an additional dataset on hERG in the Tox task. See here for more info!
  • TDC now follows black code style!

2022.11.03 TDC 0.3.8 is released! Here are the changes:

  • TDC has a new task on structure-based drug design SBDD with four datasets PDBBind, DUD-E, scPDB. See here on how to access them!
  • To support evaluation of SBDD tasks, we also include two evaluation metrics (RMSD, Kabsch-RMSD) that compare distances between two structures. See here for more info.
  • TDC has a new dataset on PAMPA (parallel artificial membrane permeability assay), which is a commonly employed assay to evaluate drug permeability across the cellular membrane in the ADME task. See here for more info!

2022.09.06 TDC 0.3.7 is released! Here are the changes:

  • TDC has a new evaluation metric on logAUC. See here and the PR.
  • TDC now supports graphein protein 3D representation for antibody develop-ability prediction. See tutorial and the PR.
  • QM task are now in 3D format. See here.
  • TDC has a harmonize function to deal with duplicated experimental entries in DTI. See here.
  • TDC now has a dataloader for PrimeKG as an auxilliary resource. See how to access PrimeKG here.
  • TDC fixed static scikit-learn version issue for gsk3b, jnk3, drd2 oracles. See here for more info.
  • The PPBR dataset in ADME task now has additional species information and the default is now only containing homo sapiens while you can retrieve other species via a TDC function. See here for more info.

2022.02.19 TDC 0.3.6 is released! TDC has a new task on TCR-Epitope Binding prediction (Thanks to Anna and Jannis)! Checkout here for more information.

2022.01.23 TDC 0.3.5 is released! Here are the changes:

  • TDC has a new large hERG dataset in Tox (Thanks to Ben)! Checkout here for more information.
  • TDC has an updated ChEMBL library (Version 29) in MolGen! The previous version is also still kept available. Checkout here for more information.
  • Reaction type information can be found within split by turning on the include_reaction_type flag for USPTO-50 in RetroSyn! Checkout here for more information.
  • Fixed bug on cold split for higher order (>2) multi-instance prediction tasks! (Thanks to Zoe !) Checkout here for more information.

2021.12.28 TDC 0.3.4 is released! Bug fixes on docking oracles and KL divergence measure.

2021.11.25 TDC 0.3.3 is released! Now added extended support for cold split in multi prediction tasks, see this issue!

2021.10.17 TDC 0.3.2 is released! We have added support for harmonizing same DTIs with different affinities (KIBA, DAVIS Updated accordingly, see this issue); support for label name retrieval for TWOSIDES (this issue), and add gene symbol info to GDSC (this issue).

2021.09.04 TDC 0.3.0 is released! We have greatly restructured the code to be contributor friendly while keeping most interfaces the same. We also release the documentation for TDC package at here.

2021.05.30 TDC updates to 0.2.0, major changes:

  • TDC has a new molecule generation benchmark on docking scores! Checkout here for more information.

2021.03.24 TDC updates to 0.1.9, major changes:

  • TDC now supports molecule filters! Checkout here for more information.

2021.03.17 TDC updates to 0.1.8, major changes:

  • Leaderboard is reformulated and we invite submission for each individual benchmark! Checkout here for more information.

2021.02.26 TDC updates to 0.1.7, major changes:

  • Streamlined leaderboard programming framework! Checkout here for more information.
  • Label log transformation supported. Checkout here for more information.

2021.02.18 TDC just released the white paper in arXiv! Here is the link to the paper.

2021.02.04 TDC updates to 0.1.6, major changes:

  • New Leaderboard! Just released the second leaderboard on drug combination response prediction! Checkout here for usage.

2021.01.16 TDC updates to 0.1.5, major changes:

  • New Oracles! Added four realistic oracles from docking scores and synthetic accessibility scores! Checkout here for usage.

2021.01.09 TDC updates to 0.1.4, major changes:

  • New Function! Added a data processing helper to map among ~15 molecular formats in 2 lines of code (For 2D: from SMILES/SEFLIES and convert to SELFIES/SMILES, Graph2D, PyG, DGL, ECFP2-6, MACCS, Daylight, RDKit2D, Morgan, PubChem; For 3D: from XYZ, SDF files to Graph3D, Columb Matrix). Checkout here for usage.
  • Quality Check! Canonicalize SMILES on DTI datasets with Drug, Target IDs added. Checkout DTI.

2020.12.30 TDC updates to 0.1.3, major changes:

  • New Dataset! Added a new therapeutic task CRISPR Repair Outcome Prediction! Checkout CRISPROutcome.
  • New Function! Added a data processing helper to map SMILES string to popular cheminformatics fingerprints (ECFP2, ECFP4, ECFP6, MACCS, Daylight-type, RDKit2D, Morgan, Pubchem)! Checkout here for usage.

2020.12.24 TDC updates to 0.1.2, major changes:

  • Leaderboard Release! TDC's first leaderboard on ADMET prediction is released. You can find the leaderboard guide here, where we provide a BenchmarkGroup class to do model building on leaderboard tasks rapidly. The ADMET leaderboard is here.

2020.12.19 TDC updates to 0.1.1, major changes:

  • Quality Check and New datasets! We replaced VD, Half Life and Clearance datasets in ADME from new sources that have higher qualities. We also added LD50 to Tox.

2020.12.15 TDC updates to 0.1.0, major changes:

  • Five New Datasets! Added CYP2C9/2D6/3A4 Substrate, for ADME, Carcinogens for Tox and NCI-60 for DrugSyn.
  • Quality Check. We conducted a canonicalization of all SMILES and removed ones that return errors in the ADME, Tox, and HTS datasets.

2020.11.30 TDC updates to 0.0.8, major changes:

  • Five New Datasets! Added hREG, DILI (Drug Induced Liver Injury), Skin Reaction, Ames Mutagenicity for Tox and PPBR from AstraZeneca for ADME.
  • Distribution Learning Metrics Moved to Evaluators. Checkout here for the updated usage.
  • Meta Oracles. We included a helper function where you can specify your own set of molecules for Rediscovery, Similarity, Medians, Isomers. Checkout an example usage here.
  • Tutorials. We have provided various tutorials for you to start using TDC. Click here .