|
--- |
|
library_name: transformers |
|
tags: |
|
- code |
|
- not-for-all-audiences |
|
license: apache-2.0 |
|
datasets: |
|
- Locutusque/hercules-v1.0 |
|
- Open-Orca/OpenOrca |
|
language: |
|
- en |
|
base_model: mistralai/Mistral-7B-v0.1 |
|
--- |
|
|
|
# Hercules-1.0-Mistral-7B |
|
data:image/s3,"s3://crabby-images/d3ca3/d3ca39b1a36d4f047bff14c7baa3e5cbfa250744" alt="Hercules" |
|
|
|
## Model description |
|
|
|
Hercules-1.0-Mistral-7B is a fine-tune of the Mistral 7B model. |
|
|
|
Designed to be a turbo-charged version of teknium's OpenHermes through augmented data sources. This model outperforms OpenHermes-7B by 8 points on average, and OpenHermes-13B by 4 points on average. I will continue working on this model to hopefully outperform OpenHermes 2.5. |
|
|
|
This model is on par with OpenHermes 2. |
|
|
|
Apart from higher performance over OpenHermes, this model has data and training transparency for reproducibility. |
|
|
|
You can learn more about the Hercules dataset here: data:image/s3,"s3://crabby-images/bd075/bd07555c3b9e8f36f0873e43309b2293d05a16d1" alt="Locutusque/hercules-v1.0" |
|
|
|
During training, the dataset is split into a test set of 100 examples. At the end of training (120,000 examples), this model achieved a test loss of 0.57. |
|
|
|
### Training details |
|
|
|
- This model was trained on 8 kaggle TPUs, using torch xla SPMD for high MXU efficiency. There was no expense on my end (meaning you can reproduce this too!) |
|
- A learning rate of 2e-06 with the Adam optimizer. No LR scheduler was used. A low learning rate was used to prevent exploding gradients. |
|
- No mixed precision was used, with the default dtype being bfloat16. |
|
- Trained on both full subsets of OpenOrca, and 120,000 examples of Hercules. (If you wish to reproduce this model, you can limit redundancy by fine-tuning Open-Orca/Mistral-7B-OpenOrca on Hercules-v1.0) |
|
- No model parameters were frozen. |
|
- This model was trained on OpenAI's ChatML prompt format. |
|
|
|
## Inference examples |
|
|
|
data:image/s3,"s3://crabby-images/409ce/409ce25c4d113ac517bbef7ce87f7ba7c17cb46a" alt="image/png" |
|
data:image/s3,"s3://crabby-images/d4ff0/d4ff0a3fa84d4d094206a9914c2c12b03d8ec029" alt="image/png" |
|
|
|
|
|
data:image/s3,"s3://crabby-images/202a9/202a9e16a1191ad9931e9d486a2a82003db1d892" alt="image/png" |
|
|
|
|
|
data:image/s3,"s3://crabby-images/3cc9a/3cc9ae0f8b23879a38589b209ec1c452d598c14c" alt="image/png" |
|
|
|
|
|
data:image/s3,"s3://crabby-images/5b698/5b698151cda9ae23f4348232d0379f8303b9da41" alt="image/png" |
|
|