Overview of TDC Datasets
At its core, TDC collects ML tasks and associated datasets across therapeutic modalities and stages of discovery. These tasks and datasets have the following properties:
- Instrumenting disease treatment from bench to bedside with AI/ML: TDC covers a variety of learning tasks going from wet-lab target identification to biomedical product manufacturing.
- Building off the latest biotechnological platforms: TDC is regularly updated with novel datasets and tasks, such as antibody therapeutics and gene editing.
- Providing AI/ML-ready datasets: TDC datasets provide rich information on biomedical entities. This information is carefully curated, processed, and readily available in TDC.
Machine Learning Tasks in TDC
ML tasks cover a range of therapeutic modalities, including small molecules and biologics, including antibodies, peptides, miRNAs, and gene editing therapies. They also map to drug discovery and development pipelines:
- Target discovery: Tasks to identify candidate drug targets.
- Activity modeling: Tasks to screen and generate individual or combinatorial candidates with high binding activity towards targets.
- Efficacy and safety: Tasks to optimize therapeutic signatures indicative of drug safety and efficacy.
- Manufacturing: Tasks in support of synthesis and manufacturing of therapeutics.
ML Tasks | Therapeutic Modalities | Stages of Discovery and Development | |||||||
---|---|---|---|---|---|---|---|---|---|
Small Molecules | MacroMolecules | Cell & Gene Therapy | Peptides | Target Discovery | Activity Modeling | Efficacy & Safety | Clinical Trial | Manufacturing | |
ADME |
|||||||||
Tox |
|||||||||
HTS |
|||||||||
QM |
|||||||||
Yields |
|||||||||
Epitope |
|||||||||
Develop |
|||||||||
CRISPROutcome |
|||||||||
DTI |
|||||||||
DDI |
|||||||||
PPI |
|||||||||
GDA |
|||||||||
DrugRes |
|||||||||
DrugSyn |
|||||||||
PeptideMHC |
|||||||||
AntibodyAff |
|||||||||
MTI |
|||||||||
Catalyst |
|||||||||
TrialOutcome |
|||||||||
MolGen |
|||||||||
RetroSyn |
|||||||||
Reaction |
|||||||||
MPC |
TDC-2 also introduced a variety of new data sources under the Resource Module. Information can be found on the tutorials in the github repo. Examples include CELLXGENE and PrimeKG.
Explore TDC Datasets