Therapeutics Data Commons

Artificial intelligence foundation for therapeutic science

Artificial intelligence is poised to enable breakthroughs and discoveries in therapeutic science. Therapeutics Data Commons is a coordinated initiative to access and evaluate artificial intelligence capability across therapeutic modalities and stages of discovery. The Commons is a resource with AI-solvable tasks, AI-ready datasets, and curated benchmarks, providing an ecosystem of tools, libraries, leaderboards, and community resources, including data functions, strategies for systematic model evaluation, meaningful data splits, data processors, and molecule generation oracles. All resources are integrated via an open Python library.
Therapeutic science is an exciting field with incredible opportunities for expansion, innovation, and impact. Curated AI-ready datasets, machine learning tasks, and benchmarks in the Commons serve as a meeting point betwen biochemical, biomedical and machine learning scientists. Therapeutics Data Commons is a resource to access and evaluate AI methods, supporting the development of AI methods, with a strong bent towards establishing the foundation of which AI methods are most suitable for drug discovery applications and why. It can facilitate algorithmic and scientific advances and accelerate AI method development, validation and transition into biomedical and clinical implementation.
TDC at a glance
Key presentations and publications of the Commons
  • Signals in the Cells: Multimodal and Contextualized Machine Learning Foundations for Therapeutics. (Spotlight) NeurIPS 2024 Workshop on AI for New Drug Modalities [Paper] [Poster]
  • (Seminar) Signals in the Cells: Multimodal and Contextualized Machine Learning Foundations for Therapeutics. Western Bioinformatics Seminar Series: Alejandro Velez-Arce [Event]
  • TDC-2: Multimodal Foundation for Therapeutic Science. Molecular Machine Learning Conference (MoML2024). Hosted at Mila Agora on June 19th [Paper] [Conference] [Poster and Tweet]
  • A Foundation Model for Clinician-centered Drug Repurposing. Nature Medicine, 2024 [Paper] [TxGNN Explorer]
  • Artificial Intelligence Foundation for Therapeutic Science. Nature Chemical Biology, 2022 [Paper]
  • Machine Learning to Translate the Cancer Genome and Epigenome Session. AACR Annual Meeting [Meeting]
  • Therapeutics Data Commons: Machine Learning Datasets and Tasks for Drug Discovery and Development. NeurIPS, 2021 [Paper] [Poster]
  • Benchmarking Molecular Machine Learning in Therapeutics Data Commons. ELLIS ML4Molecules, 2021 [Paper] [Slides]
  • Therapeutics Data Commons: Machine Learning Datasets and Tasks for Drug Discovery and Development. Baylearn, 2021 [Slides] [Poster]
  • Few-Shot Learning for Network Biology. KDD Workshop on Data Mining in Bioinformatics [keynote]
  • Actionable machine learning for drug discovery and development. Broad Institute, Models, Inference & Algorithms Seminar [Talk]
  • Therapeutics Data Commons. NSF-Harvard Symposium on Drugs for Future Pandemics, 2020 [Slides] [Video] [#futuretx20]
  • Graph Neural Networks for Biomedical Data. Machine Learning in Computational Biology [Schedule]
  • Graph Neural Networks for Identifying COVID-19 Drug Repurposing Opportunities. MIT AI Cures [Webpage]

Intuitive Interface

TDC software is minimally dependent on external packages. Any TDC dataset can be retrieved with just 3 lines of code.

From Bench to Bedside

TDC covers a wide range of learning tasks, including target discovery, activity screening, efficacy, safety, and manufacturing across biomedical products, including small molecules, antibodies, and vaccines.

Numerous Data Functions

TDC provides extensive data functions, including data evaluators, meaningful data splits, data processors, and molecule generation oracles.