Protein-Peptide Interaction Prediction Task Overview

Definition: Protein-peptide interactions are critical for many biological processes and have significant implications in research and therapeutic development. Peptides and protein-peptide interactions play crucial roles in drug discovery for several reasons, including their involvement in biological processes, their therapeutic potential, and their ability to modulate difficult targets like protein-protein interactions.
Protein-peptide interactions are essential in regulating various cellular functions such as signal transduction, protein trafficking, and epigenetic regulation. For instance, peptides derived from histone proteins play crucial roles in chromatin structure and gene expression regulation (Plach et al., 2017).
Peptides can serve as inhibitors or modulators of protein-protein interactions, offering a potential avenue for drug development, especially for targets that are challenging for small molecules (Kilburg & Gallicchio, 2016).
Peptides and peptide derivatives are increasingly being recognized for their potential as therapeutic agents to target protein-protein interactions, which are often implicated in diseases like cancer (Nevola & Giralt, 2015).
Studying protein-peptide interactions provides insights into the structural basis of these interactions, which can be leveraged to design better therapeutic molecules (London et al., 2010).

Despite the availability of several benchmarks for protein-protein interactions, there is a comparatively large gap compared to benchmarks for protein-peptide binding affinity prediction. For example, the renowned multi-task benchmark for Protein sEquence undERstanding (PEER) (Xu, Minghao et al., 2022), lacks support for a specifically defined protein-peptide binding affinity prediction task. Protein-peptide binding affinity prediction and protein-protein binding affinity prediction involve similar underlying biological interactions, but they differ significantly in their complexity and the methods used to predict them (Abdin, Osana et al., 2022).

Impact: Machine learning models can rapidly predict binding affinities between proteins and peptides, which is essential for identifying potential therapeutic peptides that can modulate protein functions. This accelerates the drug discovery process by reducing the need for extensive experimental assays (Li et al., 2019).
Traditional methods of determining protein-peptide binding affinities, such as experimental binding assays, are expensive and time-consuming. Machine learning offers a cost-effective alternative by utilizing existing data to predict new interactions without the need for additional wet-lab experiments (Kundu et al., 2018).
Protein-peptide interactions are complex and involve various biophysical and biochemical properties. Machine learning models can handle this complexity by integrating multiple data types (e.g., sequence, structural, physicochemical properties) to make accurate predictions (Aranha et al., 2020).
Predicting protein-peptide binding affinities is vital for developing personalized medicine approaches, such as designing peptide-based vaccines and immunotherapies. Accurate predictions help in selecting peptides that bind strongly to specific proteins, enhancing the effectiveness of treatments (Bhattacharya et al., 2017).

Generalization: There is generally less experimental data available for protein-peptide interactions compared to protein-protein interactions, which hampers the development and validation of predictive models (Chang & Perez, 2022). Capturing the dynamic nature of protein-peptide interactions requires advanced sampling techniques and computational resources, making the prediction of binding affinities more complex (Antes et al., 2014). TDC-2 provides benchmarks integrating newly discovered peptides to test ML models on their ability to generalize to cutting-edge peptidomimetics.

Product: biopharmaceuticals

Pipeline: Target discovery, Lead Optimization.

Ye X et al

Dataset Description: Affinity selection-mass spectrometry data of discovered ligands against single biomolecular targets (MDM2, ACE2, 12ca5) from the Pentelute Lab of MIT This dataset contains affinity selection-mass spectrometry data of discovered ligands against single biomolecular targets. Several of these AS-MS discovered ligands were taken forward for experimental validation to determine the binding affinity (KD) as measured by biolayer interferometry (BLI) to the listed target protein. If listed as a "putative binder," AS-MS alone was used to isolate the ligands to the target, with KD < 1 uM required and often observed in orthogonal assays, though there is some (< 50%) chance that the ligand is nonspecific. Most of the ligands are putative binders with 4446 total provided. For those characterized by BLI (only 34 total), the average KD is 266 ± 44 nM, median KD is 9.4 nM.

Task Description: Binary Classification. Given the target amino acid sequence pairs, predict if they interact or not.

Dataset Statistics: 34 positive ligands, 4446 putative binders, 3 proteins

Dataset Split: Random Split (stratified)

from tdc.multi_pred import ProteinPeptide 
data = ProteinPeptide(name = 'brown_mdm2_ace2_12ca5', path = './data')
split = data.get_split()

Note: If listed as a "putative binder," AS-MS alone was used to isolate the ligands to the target, with KD < 1 uM required and often observed in orthogonal assays, though there is some (< 50%) chance that the ligand is nonspecific.

Note: Only 10% of the data is provided in a train set for fine-tuning on the benchmark associated to this dataset with the remaining 90% used as the test set. Outside of the benchmark, the user may define splits as desired.


[1] Ye X, Lee YC, Gates ZP, Ling Y, Mortensen JC, Yang FS, Lin YS, Pentelute BL. Binary combinatorial scanning reveals potent poly-alanine-substituted inhibitors of protein-protein interactions. Commun Chem. 2022 Oct 14;5(1):128. doi: 10.1038/s42004-022-00737-w. PMID: 36697672; PMCID: PMC9814900.

Dataset License: CC BY 4.0.