|
--- |
|
license: mit |
|
datasets: |
|
- AmelieSchreiber/cafa5_pickle_split |
|
language: |
|
- en |
|
metrics: |
|
- accuracy |
|
- f1 |
|
- precision |
|
- recall |
|
- roc_auc |
|
library_name: transformers |
|
tags: |
|
- esm |
|
- esm2 |
|
- protein language model |
|
- biology |
|
- cafa5 |
|
--- |
|
|
|
# ESM-2 Pre-finetuned for CAFA-5 for Protein Function Prediction |
|
This model is a pre-finetuned for CAFA-5 protein function prediction for four epochs. |
|
This model is meant to be finetuned in a second stage of training with a Low Rank Adaptation. |
|
The training script for both the pre-finetuning and second stage finetuning with LoRA is |
|
[available here](https://huggingface.co/AmelieSchreiber/esm2_t6_8M_lora_cafa5/blob/main/cafa_5_finetune_v2.ipynb). |
|
This notebook allows you to pre-finetune the base model, and then use a LoRA for the second stage of training. |
|
Note, the second stage of training is a harder curriculum for the model as it uses class weights so that the |
|
model better captures the hierarchical (weighted) structure of the gene ontology (GO) terms that serve as |
|
the labels for the multilabel sequence classification task of predicting a protein's functions (GO terms). |
|
|
|
|