Counterfactual Prediction Benchmark Group

We define a task for predicting responses in gene expression of single cells to chemical and genetic perturbations, aiming to measure model generalization across cell lines and perturbation types. Understanding cellular responses to genetic perturbation is central to numerous biomedical applications, from identifying genetic interactions involved in cancer to developing methods for regenerative medicine. Furthermore, counterfactual prediction of drug-based perturbations at single-cell resolution enables cell-type specific drugs and treatments, facilitating precision medicine [ 10]. The230 predictive, non-generative task is then formalized as a function of a cell, with corresponding attributes such as cell line, disease, and tissue, and a perturbation, such as a drug type or a CRISPR-based perturbation, which outputs a count for gene expression of the cell after the input pe

In TDC-2, we’ve used the scPerturb [ 18 ] datasets for building benchmarks for this task. More details to-be-announced.

To access a benchmark in the group, use the following code:

from tdc.benchmark_group import counterfactual_group
group = counterfactual_group.CounterfactualGroup()
train, val = group.get_train_valid_split()
test = group.get_test()

## --- train your model --- ##

predictions = model.predict(test)  # modify as per your model code and test output
out = group.evaluate(predictions)

Follow the instructions on how to use the BenchmarkGroup class and obtain training, validation, and test sets, and how to submit your model to the leaderboard.

The evaluation metric is R-squared. More details to-be-announced.