bert-linnaeus-ner
This model is a fine-tuned version of bert-base-cased on the linnaeus dataset. It achieves the following results on the evaluation set:
- Loss: 0.0073
- Precision: 0.9223
- Recall: 0.9522
- F1: 0.9370
- Accuracy: 0.9985
Model description
This model can be used to find organisms and species in text data.
NB. THIS MODEL IS WIP AND IS SUBJECT TO CHANGE!
Intended uses & limitations
This model's intended use is in my Master's thesis to mask names of bacteria (and phages) for further analysis.
Training and evaluation data
Linnaeus dataset was used to train and validate the performance.
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 3
Training results
Training Loss | Epoch | Step | Validation Loss | Precision | Recall | F1 | Accuracy |
---|---|---|---|---|---|---|---|
0.0076 | 1.0 | 1492 | 0.0128 | 0.8566 | 0.9578 | 0.9044 | 0.9967 |
0.0024 | 2.0 | 2984 | 0.0082 | 0.9092 | 0.9578 | 0.9329 | 0.9980 |
0.0007 | 3.0 | 4476 | 0.0073 | 0.9223 | 0.9522 | 0.9370 | 0.9985 |
Framework versions
- Transformers 4.34.0
- Pytorch 2.1.0+cu121
- Datasets 2.14.5
- Tokenizers 0.14.0
- Downloads last month
- 29
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for mikrz/bert-linnaeus-ner
Base model
google-bert/bert-base-casedDataset used to train mikrz/bert-linnaeus-ner
Evaluation results
- Precision on linnaeusvalidation set self-reported0.922
- Recall on linnaeusvalidation set self-reported0.952
- F1 on linnaeusvalidation set self-reported0.937
- Accuracy on linnaeusvalidation set self-reported0.999