ADMET Benchmark Group
ADMET is a cornerstone of small molecule drug discovery, defining drug efficacy and toxicity profile. An ML model that could accurately predict all ADMET properties using structural information of compounds would be greatly valuable.
We formulate the ADMET Benchmark Group using 22 ADMET datasets in TDC. The ADMET Group contains the following datasets:
from tdc import utils
names = utils.retrieve_benchmark_names('ADMET_Group')
# ['caco2_wang', 'hia_hou', ....]
Type the following to access any benchmark in the group, for example, Caco2_Wang
:
from tdc.benchmark_group import admet_group
group = admet_group(path = 'data/')
benchmark = group.get('Caco2_Wang')
predictions = {}
name = benchmark['name']
train_val, test = benchmark['train_val'], benchmark['test']
## --- train your model --- ##
predictions[name] = y_pred
group.evaluate(predictions)
# {'caco2_wang': {'mae': 0.234}}
Follow the instructions on how to use the BenchmarkGroup
class and obtain training, validation, and test sets, and how to submit your model to the leaderboard.
For every dataset in the benchmark group, we use the scaffold split to partition the dataset into training, validation, and test sets. We hold out 20% data samples for the test set. The performance metrics are:
- For binary classification:
- AUROC is used when the number of positive and negative samples are similar.
- AUPRC is used when the number of positive samples are much smaller than negative samples.
- For regression:
- MAE is used for majority of benchmarks.
- Spearman's correlation coefficient is used for benchmarks that depend on factors beyond the chemical structure.
We encourage submissions that reports results for the entire benchmark group. Still, we welcome and accept submissions that report partial results, for example, submissions with results for just one out of five ADMET categories.
Benchmark Data Summary
Absorption
Absorption measures how a drug travels from the site of administration to site of action.
Dataset | Unit | Size | Task | Metric | Dataset Split |
---|---|---|---|---|---|
Caco2 | cm/s | 906 | Regression | MAE | Scaffold |
HIA | % | 578 | Binary | AUROC | Scaffold |
Pgp | % | 1,212 | Binary | AUROC | Scaffold |
Bioav | % | 640 | Binary | AUROC | Scaffold |
Lipo | log-ratio | 4,200 | Regression | MAE | Scaffold |
AqSol | log mol/L | 9,982 | Regression | MAE | Scaffold |
Distribution
Drug distribution refers to how drug moves to and from the various tissues of the body and the amount of drugs in the tissues.
Dataset | Unit | Size | Task | Metric | Dataset Split |
---|---|---|---|---|---|
BBB | % | 1,975 | Binary | AUROC | Scaffold |
PPBR | % | 1,797 | Regression | MAE | Scaffold |
VDss | L/kg | 1,130 | Regression | Spearman | Scaffold |
Metabolism
Drug metabolism measures how specialized enzymatic systems breakdown the drugs and it determines the duration and intensity of a drug's action.
Dataset | Unit | Size | Task | Metric | Dataset Split |
---|---|---|---|---|---|
CYP2C9 Inhibition | % | 12,092 | Binary | AUPRC | Scaffold |
CYP2D6 Inhibition | % | 13,130 | Binary | AUPRC | Scaffold |
CYP3A4 Inhibition | % | 12,328 | Binary | AUPRC | Scaffold |
CYP2C9 Substrate | % | 666 | Binary | AUPRC | Scaffold |
CYP2D6 Substrate | % | 664 | Binary | AUPRC | Scaffold |
CYP3A4 Substrate | % | 667 | Binary | AUROC | Scaffold |
Excretion
Drug excretion is the removal of drugs from the body using various different routes of excretion, including urine, bile, sweat, saliva, tears, milk, and stool.
Dataset | Unit | Size | Task | Metric | Dataset Split |
---|---|---|---|---|---|
Half Life | hr | 667 | Regression | Spearman | Scaffold |
CL-Hepa | uL.min-1.(10^6 cells)-1 | 1,020 | Regression | Spearman | Scaffold |
CL-Micro | mL.min-1.g-1 | 1,102 | Regression | Spearman | Scaffold |
Toxicity
Toxicity measures how much damage a drug could cause to organisms.
Dataset | Unit | Size | Task | Metric | Dataset Split |
---|---|---|---|---|---|
LD50 | log(1/(mol/kg)) | 7,385 | Regression | MAE | Scaffold |
hERG | % | 648 | Binary | AUROC | Scaffold |
Ames | % | 7,255 | Binary | AUROC | Scaffold |
DILI | % | 475 | Binary | AUROC | Scaffold |