Catalyst Prediction Task Overview

Definition: During chemical reaction, catalyst is able to increase the rate of the reaction. Catalysts are not consumed in the catalyzed reaction but can act repeatedly. This learning task aims to predict the catalyst for a reaction given both reactant molecules and product molecules.

Impact: Conventionally, chemists design and synthesize catalysts by trial and error with chemical intuition, which is usually time-consuming and costly. Machine learning model and automate and accelerate the process, understand the catalytic mechanism, and providing an insight into novel catalytic design.

Generalization: In real-world discovery, as discussed, the molecule structures in reaction of interest evolve over time. We expect model to generalize to the unseen molecules and reaction.

Product: Small-molecule.

Pipeline: Manufacturing - synthesis planning.

USPTO

Dataset Description: USPTO (United States Patent and Trademark Office) 50K consists of 50K extracted atom-mapped reactions with 10 reaction types. TDC selects the most common catalysts that have occurences higher than 100 times.

Task Description: Given reactant and product set X, predict the catalyst Y from a set of most common catalysts.

Dataset Statistics: 721,799 reactions, 712,757 reactants and 702,940 products with 888 common catalyst types.

Dataset Split: Random Split

from tdc.multi_pred import Catalyst
data = Catalyst(name = 'USPTO_Catalyst')
split = data.get_split()

Note: To know what type of catalyst the label index corresponds to, use:

from tdc.utils import get_label_map
get_label_map(name = 'USPTO_Catalyst', task = 'Catalyst')

References:

[1] Lowe, Daniel Mark. Extraction of chemical structures and reactions from the literature. Diss. University of Cambridge, 2012.

[2] Gao, Hanyu, et al. “Using machine learning to predict suitable conditions for organic reactions.” ACS central science 4.11 (2018): 1465-1476.

Dataset License: CC0.