Paratope Prediction Task Overview

Definition: Antibodies, also known as immunoglobulins, are large, Y-shaped proteins that can identify and neutralize a pathogen's unique molecule, usually called an antigen. They play essential roles in the immune system and are powerful tools in research and diagnostics. A paratope, also called an antigen-binding site, is the region that selectively binds the epitope. Although we roughly know the hypervariable regions that are responsible for binding, it is still challenging to pinpoint the interacting amino acids. This task is to predict which amino acids are in the active position of antibody that can bind to the antigen.

Impact: Identifying the amino acids at critical positions can accelerate the engineering processes of novel antibodies.

Generalization: The models are expected to be generalized to unseen antibodies with distinct structures and functions.

Product: Antibody.

Pipeline: Activity, efficacy and safety.

SAbDab, Liberis et al.

Dataset Description: Paratope prediction is to predict the active binding region in the antibody. This dataset is from Parapred, which curates a dataset from SAbDab. It collects both heavy and light chain sequence.

Task Description: Token-level classification. Given an amino acid sequence, predict amino acid token that is active in binding, i.e. X is amino acid sequence, Y is a list of indices for the active positions in X.

Dataset Statistics: 1,023 antibody chains sequence.

Dataset Split: Random Split

from tdc.single_pred import Paratope
data = Paratope(name = 'SAbDab_Liberis')
split = data.get_split()

References:

[1] Liberis, Edgar, et al. “Parapred: antibody paratope prediction using convolutional and recurrent neural networks.” Bioinformatics 34.17 (2018): 2944-2950.

[2] Dunbar, James, et al. “SAbDab: the structural antibody database.” Nucleic acids research 42.D1 (2014): D1140-D1146.

Dataset License: CC BY 3.0.