license: mit tags: - biology
Model description
MHC-II-EpiPred (MHC-II-EpiPred, MHC II molecular epitope prediction) is a protein language model fine-tuned from ESM2 pretrained model (facebook/esm2_t33_650M_UR50D) on a T cell MHC II epitope dataset.
MHC-II-EpiPred is a classification model for predicting the class of MHC II epitope.
Dataset
The original data was downloaded from IEDB data base at https://www.iedb.org/home_v3.php.
The full data can be downloaded at https://www.iedb.org/downloader.php?file_name=doc/tcell_full_v3.zip
This dataset comprises 543,717 T-cell epitope entries, spanning a variety of species and infections caused by diverse viruses. The epitope information included encompasses a broad range of potential sources, including data relevant to disease immunotherapy.
Finally, the dataset we used to train the model contains 60,256 positive and negative samples, which is stored in https://github.com/pengsihua2023/MHC-II-EpiPred/tree/main/data.
Results
MHC-II-EpiPred achieved the following results:
Training Loss (cross-entropy loss, CEL): 0.1407
Training Accuracy: 0.9898
Evaluation Loss (cross-entropy loss, CEL): 0.0836
Evaluation Accuracy: 0.9703
Epochs: 324
Model training code at GitHub
https://github.com/pengsihua2023/MHC-II-EpiPred
How to use MHC-II-EpiPred
An example
Pytorch and transformers libraries should be installed in your system.
Install pytorch
pip install torch torchvision torchaudio
Install transformers
pip install transformers
Run the following code
Coming soon!
Funding
This project was funded by the CDC to Justin Bahl (BAA 75D301-21-R-71738).
Model architecture, coding and implementation
Sihua Peng
Group, Department and Institution
Lab: Justin Bahl
Department: College of Veterinary Medicine Department of Infectious Diseases
Institution: The University of Georgia
- Downloads last month
- 7