Overview of TDC Datasets

At its core, TDC collects ML tasks and associated datasets spread across therapeutic domains. These tasks and datasets have the following properties:

  • Instrumenting disease treatment from bench to bedside with AI/ML: TDC covers a variety of learning tasks going from wet-lab target identification to biomedical product manufacturing.
  • Building off the latest biotechnological platforms: TDC is regularly updated with novel datasets and tasks, such as antibody therapeutics and gene editing.
  • Providing AI/ML-ready datasets: TDC datasets provide rich information on biomedical entities. This information is carefully curated, processed, and readily available in TDC.
TDC logo

Machine Learning Tasks in TDC

TDC tasks cover a range of therapeutic modalities and pipelines. These span small molecules and biologics, where the latter group includes antibodies, peptides, miRNAs, and genome editing therapeutics.

Further, TDC tasks map to the following drug discovery and development pipelines:

  • Target discovery: Tasks to identify candidate drug targets.
  • Activity modeling: Tasks to screen and generate individual or combinatorial candidates with high binding activity towards targets.
  • Efficacy and safety: Tasks to optimize therapeutic signatures indicative of drug safet and efficacy.
  • Manufacturing: Tasks to synthesize therapeutics.

Below is a summary table of TDC learning tasks. To explore datasets, click the tag for task of interest.

ML Tasks Therapeutics Products Development Pipelines
Small Molecules MacroMolecules Cell & Gene Therapy Target Discovery Activity Modeling Efficacy & Safety Manufacturing
ADME
Tox
HTS
QM
Yields
Paratope
Epitope
Develop
CRISPROutcome
DTI
DDI
PPI
GDA
DrugRes
DrugSyn
PeptideMHC
AntibodyAff
MTI
Catalyst
MolGen
RetroSyn
Reaction
TDC maintains a list of external resources relevant to drug discovery. Click here to see them.

Start Exploring TDC Datasets