Dataset description
- MHCPEP dataset (Reference
list)
This is the dataset from MHCPEP database, a curated collection of human and
mouse MHC binding peptides. The dataset comprises of more than 13,000 peptide
entries, which were gathered from published reports as well as direct submissions
of experimental data.
MHC-I datasets
- IEDB datasets (Training dataset 1 and 2) (Reference)
Datasets of high quality MHC class I binding peptides (9mers and 10mers)obtained from Immune
Epitope Database (IEDB). The binding affinities (IC50) of these
peptides were quantitatively measured by immunological experiments. They were
then scaled to binding scores ranging from 0 to 100, where score >=80 are
strong binders, 50-79 are moderate to low binders, and <50
are non-binders. This dataset could be used as the training dataset to develop
computational models to predict MHC-I binding peptides.
- CBS datasets (Training dataset 3) (Reference)
CBS (Center for Biological Sequence analysis, Technical University of Denmark) datasets contain high quality MHC class I 9mer binding peptides. The binding affinities (IC50) of these
peptides were quantitatively measured by immunological experiments. They were
then scaled to binding scores ranging from 0 to 100, where score >=80 are
strong binders, 50-79 are moderate to low binders, and <50
are non-binders. This dataset could be used as the training dataset to develop
computational models to predict MHC-I binding peptides.
- Multipred and Hotspot Hunter datasets
A collection of peptides binding to A2, A3, and B7 supertypes. The binding
affinities were originally measured by scores ranging from 1 to 9, and then
scaled to scores ranging from 0 to 100 as previous dataset. The dataset was
used to develop MULTIPRED and Hotspot Hunter
and it could be used as an additional training dataset to develop computational
models to predict MHC-I binding peptides.
- Validation dataset 1: Survivin datasets (Reference)
Derived from a full overlapping study of 134 nonamer peptides spanning the
full length of the tumor antigen survivin (Swiss-Prot: O15392).
The original binding scores were measured by
iTopiaTM Epitope Discovery System and then scaled to scores ranging
from 0 to100. It could be used as a validation dataset for computational models
to predict MHC-I binding peptides.
- Validation dataset 2: CMV datasets (Reference)
Each of them contains 42 peptides spanning a 50 amino acids long construct containing
cytomegalovirus (CMV) internal matrix protein pp65 peptides. The original
binding scores were measured by iTopiaTM Epitope Discovery System
and then scaled to scores ranging from 0 to 100. It could be used as a validation
dataset for computational models to predict MHC-I binding peptides.
- Validation dataset 3: 5T4 datasets (Reference)
Each of them contains 206 peptides spanning the full length of the tumor-associated antigen 5T4. The original
binding measurements were represented by the percentage of binding affinity relative to control peptides. They were scaled to scores ranging from 0 to 100. It could be used as a validation dataset for computational models to predict MHC-I binding peptides.
- Validation dataset 4: MLI competition 9mer datasets
They contain lists of 9mer peptides whose original binding measurements are IC50. They were
then scaled to binding scores ranging from 0 to 100, where score >=80 are
strong binders, 50-79 are moderate to low binders, and <50
are non-binders. This dataset could be used as the training dataset to develop
computational models to predict MHC-I binding peptides.
- Validation dataset 5: MLI competition 10mer datasets
They contain lists of 10mer peptides whose original binding measurements are IC50. They were
then scaled to binding scores ranging from 0 to 100, where score >=80 are
strong binders, 50-79 are moderate to low binders, and <50
are non-binders. This dataset could be used as the training dataset to develop
computational models to predict MHC-I binding peptides.
- Combined validation dataset (9mer): Combination of all the 9mer validation datasets
It could be used as validation dataset for computational models to predict
MHC-I binding peptides.
MHC-II datasets
- IEDB dataset (Reference)
A dataset of high quality MHC class II binding peptides obtained from Immune
Epitope Database (IEDB). The binding affinities (IC50) of peptides
were measured by the ability to inhibit the binding of a radio labeled standard
peptide. They were scaled to binding scores ranging from 0 to 100, where score >=80 are
strong binders, 50-79 are moderate to low binders, and <50
are non-binders. This dataset could be used as the training
dataset to develop computational models to predict MHC binding peptides.
- Validation dataset (References)
Binding peptides derived from four protein antigens. It could be used as a
validation dataset of computational models to predict MHC-II binding peptides.