Epitope Prediction Task Overview
Definition: An epitope, also known as antigenic determinant, is the region of a pathogen that can be recognized by antibody and cause adaptive immune response. This task is to classify the active and non-active sites from the antigen protein sequences.
Impact: Identifying the potential epitope is of primary importance in many clinical and biotechnologies, such as vaccine design and antibody development, and for our general understanding of the immune system.
Generalization: The models are expected to be generalized to unseen pathogens antigens amino acid sequences with diverse set of structures and functions.
Product: Immunotherapy.
Pipeline: Target discovery.
IEDB, Jespersen et al.
Dataset Description: Epitope prediction is to predict the active region in the antigen. This dataset is from Bepipred, which curates a dataset from IEDB. It collects B-cell epitopes and non-epitope amino acids determined from crystal structures.
Task Description: Token-level classification. Given an amino acid sequence, predict amino acid token that is active in binding, i.e. X is amino acid sequence, Y is a list of indices for the active positions in X.
Dataset Statistics: 3,159 antigens.
Dataset Split: Random Split
from tdc.single_pred import Epitope
data = Epitope(name = 'IEDB_Jespersen')
split = data.get_split()
References:
Dataset License: CC BY 4.0.
PDB, Jespersen et al.
Dataset Description: Epitope prediction is to predict the active region in the antigen. This dataset is from Bepipred, which curates a dataset from PDB. It collects B-cell epitopes and non-epitope amino acids determined from crystal structures.
Task Description: Token-level classification. Given the antigen's amino acid sequence, predict amino acid token that is active in binding, i.e. X is an amino acid sequence, Y is a list of indices for the active tokens in X.
Dataset Statistics: 447 antigens.
Dataset Split: Random Split
from tdc.single_pred import Epitope
data = Epitope(name = 'PDB_Jespersen')
split = data.get_split()
References:
[2] Berman, Helen M., et al. “The protein data bank.” Nucleic acids research 28.1 (2000): 235-242.
Dataset License: CC BY 4.0.