Epitope Prediction Task Overview

Definition: An epitope, also known as antigenic determinant, is the region of a pathogen that can be recognized by antibody and cause adaptive immune response. This task is to classify the active and non-active sites from the antigen protein sequences.

Impact: Identifying the potential epitope is of primary importance in many clinical and biotechnologies, such as vaccine design and antibody development, and for our general understanding of the immune system.

Generalization: The models are expected to be generalized to unseen pathogens antigens amino acid sequences with diverse set of structures and functions.

Product: Immunotherapy.

Pipeline: Target discovery.

IEDB, Jespersen et al.

Dataset Description: Epitope prediction is to predict the active region in the antigen. This dataset is from Bepipred, which curates a dataset from IEDB. It collects B-cell epitopes and non-epitope amino acids determined from crystal structures.

Task Description: Token-level classification. Given an amino acid sequence, predict amino acid token that is active in binding, i.e. X is amino acid sequence, Y is a list of indices for the active positions in X.

Dataset Statistics: 3,159 antigens.

Dataset Split: Random Split

from tdc.single_pred import Epitope
data = Epitope(name = 'IEDB_Jespersen')
split = data.get_split()

References:

[1] Vita, Randi, et al. “The immune epitope database (IEDB): 2018 update.” Nucleic acids research 47.D1 (2019): D339-D343.

[2] Jespersen, Martin Closter, et al. “BepiPred-2.0: improving sequence-based B-cell epitope prediction using conformational epitopes.” Nucleic acids research 45.W1 (2017): W24-W29.

Dataset License: CC BY 4.0.

PDB, Jespersen et al.

Dataset Description: Epitope prediction is to predict the active region in the antigen. This dataset is from Bepipred, which curates a dataset from PDB. It collects B-cell epitopes and non-epitope amino acids determined from crystal structures.

Task Description: Token-level classification. Given the antigen's amino acid sequence, predict amino acid token that is active in binding, i.e. X is an amino acid sequence, Y is a list of indices for the active tokens in X.

Dataset Statistics: 447 antigens.

Dataset Split: Random Split

from tdc.single_pred import Epitope
data = Epitope(name = 'PDB_Jespersen')
split = data.get_split()

References:

[1] Jespersen, Martin Closter, et al. “BepiPred-2.0: improving sequence-based B-cell epitope prediction using conformational epitopes.” Nucleic acids research 45.W1 (2017): W24-W29.

[2] Berman, Helen M., et al. “The protein data bank.” Nucleic acids research 28.1 (2000): 235-242.

Dataset License: CC BY 4.0.