Overview of TDC Datasets
At its core, TDC collects ML tasks and associated datasets across therapeutic modalities and stages of discovery. These tasks and datasets have the following properties:
- Instrumenting disease treatment from bench to bedside with AI/ML: TDC covers a variety of learning tasks going from wet-lab target identification to biomedical product manufacturing.
- Building off the latest biotechnological platforms: TDC is regularly updated with novel datasets and tasks, such as antibody therapeutics and gene editing.
- Providing AI/ML-ready datasets: TDC datasets provide rich information on biomedical entities. This information is carefully curated, processed, and readily available in TDC.
Machine Learning Tasks in TDC
ML tasks cover a range of therapeutic modalities, including small molecules and biologics, including antibodies, peptides, miRNAs, and gene editing therapies. They also map to drug discovery and development pipelines:
- Target discovery: Tasks to identify candidate drug targets.
- Activity modeling: Tasks to screen and generate individual or combinatorial candidates with high binding activity towards targets.
- Efficacy and safety: Tasks to optimize therapeutic signatures indicative of drug safety and efficacy.
- Manufacturing: Tasks in support of synthesis and manufacturing of therapeutics.
ML Tasks | Therapeutic Modalities | Stages of Discovery and Development | ||||||
---|---|---|---|---|---|---|---|---|
Small Molecules | MacroMolecules | Cell & Gene Therapy | Target Discovery | Activity Modeling | Efficacy & Safety | Clinical Trial | Manufacturing | |
ADME |
||||||||
Tox |
||||||||
HTS |
||||||||
QM |
||||||||
Yields |
||||||||
Epitope |
||||||||
Develop |
||||||||
CRISPROutcome |
||||||||
DTI |
||||||||
DDI |
||||||||
PPI |
||||||||
GDA |
||||||||
DrugRes |
||||||||
DrugSyn |
||||||||
PeptideMHC |
||||||||
AntibodyAff |
||||||||
MTI |
||||||||
Catalyst |
||||||||
TrialOutcome |
||||||||
MolGen |
||||||||
RetroSyn |
||||||||
Reaction |
||||||||
More... find a comprehensive list of tasks in each ML Problem's page. |
TDC-2 also introduced a variety of new data sources under the Resource Module. Information can be found on the tutorials in the github repo. Examples include CELLXGENE and PrimeKG.
Explore TDC Datasets