Overview of TDC Datasets

At its core, TDC collects ML tasks and associated datasets across therapeutic modalities and stages of discovery. These tasks and datasets have the following properties:

  • Instrumenting disease treatment from bench to bedside with AI/ML: TDC covers a variety of learning tasks going from wet-lab target identification to biomedical product manufacturing.
  • Building off the latest biotechnological platforms: TDC is regularly updated with novel datasets and tasks, such as antibody therapeutics and gene editing.
  • Providing AI/ML-ready datasets: TDC datasets provide rich information on biomedical entities. This information is carefully curated, processed, and readily available in TDC.
TDC logo

Machine Learning Tasks in TDC

ML tasks cover a range of therapeutic modalities, including small molecules and biologics, including antibodies, peptides, miRNAs, and gene editing therapies. They also map to drug discovery and development pipelines:

  • Target discovery: Tasks to identify candidate drug targets.
  • Activity modeling: Tasks to screen and generate individual or combinatorial candidates with high binding activity towards targets.
  • Efficacy and safety: Tasks to optimize therapeutic signatures indicative of drug safety and efficacy.
  • Manufacturing: Tasks in support of synthesis and manufacturing of therapeutics.
ML Tasks Therapeutic Modalities Stages of Discovery and Development
Small Molecules MacroMolecules Cell & Gene Therapy Peptides Target Discovery Activity Modeling Efficacy & Safety Clinical Trial Manufacturing
ADME
Tox
HTS
QM
Yields
Epitope
Develop
CRISPROutcome
DTI
DDI
PPI
GDA
DrugRes
DrugSyn
PeptideMHC
AntibodyAff
MTI
Catalyst
TrialOutcome
MolGen
RetroSyn
Reaction
MPC

TDC-2 also introduced a variety of new data sources under the Resource Module. Information can be found on the tutorials in the github repo. Examples include CELLXGENE and PrimeKG.


Explore TDC Datasets