Leaderboard Guidelines

TDC benchmarks provide a systematic model development and evaluation framework. TDC benchmarks can considerably accelerate machine-learning model development, validation and transition into production and clinical implementation.

Benchmark Group

Each dataset in TDC can be thought of as a benchmark. For a machine learning model to be useful for a particular therapeutic usage, the model needs to achieve consistently good performance across a set of datasets or tasks. For this reason, we group individual benchmarks in TDC into meaningful batches, which we call benchmark groups. All datasets and tasks within a benchmark group are carefully selected and are centered around a particular theme. Further, dataset splits and evaluation metrics are also carefully selected to reflect the challenges of real-world settings where the models are ultimately implemented.

An Example of a Benchmark Group

One key task in drug discovery is the ADMET property prediction. A machine learning model that excels at ADMET needs to work well across a wide range of individual ADMET indices, such as Caco2, HIA and others. For this reason, TDC provides the ADMET Benchmark Group, which consists of 22 datasets from ADME and Tox.

How to Access a Benchmark Group

TDC provides a programming framework to access the data in a benchmark group. We use ADMET group as an example.

from tdc import BenchmarkGroup
group = BenchmarkGroup(name = 'ADMET_Group', path = 'data/')
predictions = {}

for benchmark in group:
    name = benchmark['name']
    train_val, test = benchmark['train_val'], benchmark['test']
    ## --- train your model --- ##
    predictions[name] = y_pred_test

group.evaluate(predictions)
# {'caco2_wang': {'mae': 4.328}, 'hia_hou': {'roc-auc': 0.802}, ...}

For each benchmark, TDC provides a fixed test set and a train_val set. Users can construct their own training and validation split from the train_val set but must evaluate on the given test set for fair comparison. User can either (1) directly construct customized training and validation set from the train_val variable or (2) use TDC utility function by specifying the supported split scheme from here given different random seeds:

train, valid = group.get_train_valid_split(benchmark = 'Caco2_Wang', split_type = 'default', seed = 42)

To encourage performance robutsness, TDC requires at least five model runs and reports the mean and standard deviation of the result. Here is an example to load five different train and validation splits and use the evaluate_many function to calculate the mean and standard deviation in the submission format:

from tdc import BenchmarkGroup
group = BenchmarkGroup(name = 'ADMET_Group', path = 'data/')
predictions_list = []

for seed in [1, 2, 3, 4, 5]:
    predictions = {}
    for benchmark in group:
        name = benchmark['name']
        train_val, test = benchmark['train_val'], benchmark['test']
        train, valid = group.get_train_valid_split(benchmark = name, split_type = 'default', seed = seed)
        ## --- train your model --- ##
        predictions[name] = y_pred_test
    predictions_list.append(predictions)

group.evaluate_many(predictions_list)
# {'caco2_wang': [6.328, 0.101], 'hia_hou': [0.5, 0.01], ...}

To access and evaluate each individual benchmark, use:

benchmark = group.get('Caco2_Wang')
predictions = {}

name = benchmark['name']
train_val, test = benchmark['train_val'], benchmark['test']
train, valid = group.get_train_valid_split(benchmark = name, split_type = 'default', seed = seed)
## --- train your model --- ##
predictions[name] = y_pred

group.evaluate(predictions)
# {'caco2_wang': {'mae': 0.234}}

You can get access to individual benchmark dataset name for a benchmark group via:

from tdc import utils
names = utils.retrieve_benchmark_names('ADMET_Group')
# ['caco2_wang', 'hia_hou', ....]

How to Submit Results

To submit the results of your model and be included in a TDC leaderboard, fill out this form.

The FAIR Guiding Principles

TDC leaderboards keep track of machine learning models across a variety of learning tasks. For this reason, we apply the FAIR4RS principles and implementation guidelines to any research software and AI/ML algorithms included in TDC leaderboards. Research software and AI/ML algorithms should be open and adhere to the FAIR principles (findable, accessible, interoperable, and repeatable) to allow repeatability, reproducibility, and reuse. Further, TDC itself follows the FAIR guidelines for both datasets as well as AI/ML algorithms.