Therapeutics Data Commons

Machine Learning Datasets and Tasks for Therapeutics

Therapeutics Data Commons is an open-science initiative with AI/ML-ready datasets and AI/ML tasks for therapeutics, spanning the discovery and development of safe and effective medicines. TDC provides an ecosystem of tools, libraries, leaderboards, and community resources, including data functions, strategies for systematic model evaluation, meaningful data splits, data processors, and molecule generation oracles. All resources are integrated via an open Python library.
Our Vision
Therapeutics machine learning is an exciting field with incredible opportunities for expansion, innovation, and impact. Curated AI/ML-ready datasets, machine learning tasks, and benchmarks in Therapeutics Data Commons serve as a meeting point for domain and machine learning scientists. Therapeutics Data Commons is the first unifying resource to systematically access and evaluate artificial intelligence methods across the entire range of therapeutics. Therapeutics Data Commons can facilitate algorithmic and scientific advances and accelerate machine learning method development, validation and transition into biomedical and clinical implementation.
TDC at a glance
TDC is a community-driven and open-science initiative. If you want to contribute to TDC, join us on Slack.

Intuitive Interface

TDC software is minimally dependent on external packages. Any TDC dataset can be retrieved with just 3 lines of code.

From Bench to Bedside

TDC covers a wide range of learning tasks, including target discovery, activity screening, efficacy, safety, and manufacturing across biomedical products, including small molecules, antibodies, and vaccines.

Numerous Data Functions

TDC provides extensive data functions, including data evaluators, meaningful data splits, data processors, and molecule generation oracles.