Reaction Yields Prediction Task Overview

Definition: Vast majority of small-molecule drugs are synthesized through chemical reactions. Many factors during reactions could lead to suboptimal reactants-products conversion rate, i.e. yields. Formally, it is defined as the percentage of the reactants successfully converted to the target product. This learning task aims to predict the yield of a given single chemical reaction.

Impact: To maximize the synthesis efficiency of interested products, an accurate prediction of the reaction yield could help chemists to plan ahead and switch to alternate reaction routes, by which avoiding investing hours and materials in wet-lab experiments and reducing the number of attempts.

Generalization: The models are expected to extrapolate to unseen reactions with diverse chemical structures and reaction types.

Product: Small-molecule.

Pipeline: Manufacturing - Synthesis planning.

Buchwald-Hartwig

Dataset Description: Ahneman et al. performed high-throughput experiments on Pd-catalysed Buchwald–Hartwig C-N cross coupling reactions, measuring the yields for each reaction.

Task Description: Given reactant and product set X, predict the yields Y.

Dataset Statistics: 55,370 reactions.

Dataset Split: Random Split

from tdc.single_pred import Yields
data = Yields(name = 'Buchwald-Hartwig')
split = data.get_split()

References:

[1] Sandfort et al. “A structure-based platform for predicting chemical reactivity.” Chem (2020).

[2] Ahneman et al. “Predicting reaction performance in C–N cross-coupling using machine learning.” Science 360.6385 (2018): 186-190.

[3] Schwaller, Philippe, et al. “Prediction of Chemical Reaction Yields using Deep Learning.” (2020). ChemRxiv.

Dataset License: Not Specified. CC BY 4.0.


USPTO

Dataset Description: TDC parses the yields outcome from the full USPTO (United States Patent and Trademark Office) dataset.

Task Description: Given reactant and product set X, predict the yields Y.

Dataset Statistics: 853,638 reactions.

Dataset Split: Random Split

from tdc.single_pred import Yields
data = Yields(name = 'USPTO_Yields')
split = data.get_split()

References:

[1] Lowe, Daniel Mark. Extraction of chemical structures and reactions from the literature. Diss. University of Cambridge, 2012.

Dataset License: CC0.