Interpretation

Dataset description

MHCPEP dataset (Reference list)
This is the dataset from MHCPEP database, a curated collection of human and mouse MHC binding peptides. The dataset comprises of more than 13,000 peptide entries, which were gathered from published reports as well as direct submissions of experimental data.

MHC-I datasets

IEDB datasets (Training dataset 1 and 2) (Reference)
Datasets of high quality MHC class I binding peptides (9mers and 10mers)obtained from Immune Epitope Database (IEDB). The binding affinities (IC₅₀) of these peptides were quantitatively measured by immunological experiments. They were then scaled to binding scores ranging from 0 to 100, where score >=80 are strong binders, 50-79 are moderate to low binders, and <50 are non-binders. This dataset could be used as the training dataset to develop computational models to predict MHC-I binding peptides.
CBS datasets (Training dataset 3) (Reference)
CBS (Center for Biological Sequence analysis, Technical University of Denmark) datasets contain high quality MHC class I 9mer binding peptides. The binding affinities (IC₅₀) of these peptides were quantitatively measured by immunological experiments. They were then scaled to binding scores ranging from 0 to 100, where score >=80 are strong binders, 50-79 are moderate to low binders, and <50 are non-binders. This dataset could be used as the training dataset to develop computational models to predict MHC-I binding peptides.
Multipred and Hotspot Hunter datasets

MULTIPRED

Hotspot Hunter

Validation dataset 1: Survivin datasets (Reference)
Derived from a full overlapping study of 134 nonamer peptides spanning the full length of the tumor antigen survivin (Swiss-Prot: O15392). The original binding scores were measured by

^TM

Validation dataset 2: CMV datasets (Reference)
Each of them contains 42 peptides spanning a 50 amino acids long construct containing cytomegalovirus (CMV) internal matrix protein pp65 peptides. The original binding scores were measured by iTopia^TM Epitope Discovery System and then scaled to scores ranging from 0 to 100. It could be used as a validation dataset for computational models to predict MHC-I binding peptides.
Validation dataset 3: 5T4 datasets (Reference)
Each of them contains 206 peptides spanning the full length of the tumor-associated antigen 5T4. The original binding measurements were represented by the percentage of binding affinity relative to control peptides. They were scaled to scores ranging from 0 to 100. It could be used as a validation dataset for computational models to predict MHC-I binding peptides.
Validation dataset 4: MLI competition 9mer datasets
They contain lists of 9mer peptides whose original binding measurements are IC₅₀. They were then scaled to binding scores ranging from 0 to 100, where score >=80 are strong binders, 50-79 are moderate to low binders, and <50 are non-binders. This dataset could be used as the training dataset to develop computational models to predict MHC-I binding peptides.
Validation dataset 5: MLI competition 10mer datasets
They contain lists of 10mer peptides whose original binding measurements are IC₅₀. They were then scaled to binding scores ranging from 0 to 100, where score >=80 are strong binders, 50-79 are moderate to low binders, and <50 are non-binders. This dataset could be used as the training dataset to develop computational models to predict MHC-I binding peptides.
Combined validation dataset (9mer): Combination of all the 9mer validation datasets
It could be used as validation dataset for computational models to predict MHC-I binding peptides.

MHC-II datasets

IEDB dataset (Reference)
A dataset of high quality MHC class II binding peptides obtained from Immune Epitope Database (IEDB). The binding affinities (IC₅₀) of peptides were measured by the ability to inhibit the binding of a radio labeled standard peptide. They were scaled to binding scores ranging from 0 to 100, where score >=80 are strong binders, 50-79 are moderate to low binders, and <50 are non-binders. This dataset could be used as the training dataset to develop computational models to predict MHC binding peptides.
Validation dataset (References)
Binding peptides derived from four protein antigens. It could be used as a validation dataset of computational models to predict MHC-II binding peptides.