|
--- |
|
language: |
|
- en |
|
license: mit |
|
--- |
|
|
|
Nape-0 |
|
|
|
Nape series are small models that tries to exihibit much capabilities. |
|
The model is still in training process. This is very early preview. |
|
|
|
You can load it as follows: |
|
|
|
``` |
|
from transformers import LlamaForCausalLM, AutoTokenizer |
|
tokenizer = AutoTokenizer.from_pretrained("nnpy/Nape-0") |
|
model = LlamaForCausalLM.from_pretrained("nnpy/Nape-0") |
|
``` |
|
|
|
## Training |
|
It took 1 days to train 3 epochs on 4x A6000s using native deepspeed. |
|
|
|
``` |
|
assistant role: You are Semica, a helpful AI assistant. |
|
user: {prompt} |
|
assistant: |
|
|
|
``` |
|
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) |
|
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_nnpy__Nape-0) |
|
|
|
| Metric | Value | |
|
|-----------------------|---------------------------| |
|
| Avg. | 30.93 | |
|
| ARC (25-shot) | 32.68 | |
|
| HellaSwag (10-shot) | 58.68 | |
|
| MMLU (5-shot) | 24.88 | |
|
| TruthfulQA (0-shot) | 38.99 | |
|
| Winogrande (5-shot) | 57.3 | |
|
| GSM8K (5-shot) | 0.08 | |
|
| DROP (3-shot) | 3.89 | |
|
|