Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,90 @@
|
|
1 |
-
---
|
2 |
-
|
3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
library_name: transformers
|
3 |
+
license: llama2
|
4 |
+
datasets:
|
5 |
+
- aqua_rat
|
6 |
+
- microsoft/orca-math-word-problems-200k
|
7 |
+
- m-a-p/CodeFeedback-Filtered-Instruction
|
8 |
+
- anon8231489123/ShareGPT_Vicuna_unfiltered
|
9 |
+
---
|
10 |
+
|
11 |
+
## Description
|
12 |
+
This repo contains GGUF format model files for Llama-3-Smaug-8B.
|
13 |
+
|
14 |
+
## Files Provided
|
15 |
+
| Name | Quant | Bits | File Size | Remark |
|
16 |
+
| -------------------------- | ----- | ---- | --------- | -------------------------------- |
|
17 |
+
| llama-3-smaug-8b.Q2_K.gguf | Q2_K | 2 | 3.18 GB | 2.96G, +3.5199 ppl @ Llama-3-8B |
|
18 |
+
| llama-3-smaug-8b.Q3_K.gguf | Q3_K | 3 | 4.02 GB | 3.74G, +0.6569 ppl @ Llama-3-8B |
|
19 |
+
| llama-3-smaug-8b.Q4_0.gguf | Q4_0 | 4 | 4.66 GB | 4.34G, +0.4685 ppl @ Llama-3-8B |
|
20 |
+
| llama-3-smaug-8b.Q4_K.gguf | Q4_K | 4 | 4.92 GB | 4.58G, +0.1754 ppl @ Llama-3-8B |
|
21 |
+
| llama-3-smaug-8b.Q5_K.gguf | Q5_K | 5 | 5.73 GB | 5.33G, +0.0569 ppl @ Llama-3-8B |
|
22 |
+
| llama-3-smaug-8b.Q6_K.gguf | Q6_K | 6 | 6.60 GB | 6.14G, +0.0217 ppl @ Llama-3-8B |
|
23 |
+
| llama-3-smaug-8b.Q8_0.gguf | Q8_0 | 8 | 8.54 GB | 7.96G, +0.0026 ppl @ Llama-3-8B |
|
24 |
+
|
25 |
+
## Parameters
|
26 |
+
| path | type | architecture | rope_theta | sliding_win | max_pos_embed |
|
27 |
+
| ------------------------- | ----- | ---------------- | ---------- | ----------- | ------------- |
|
28 |
+
| abacusai/Llama-3-Smaug-8B | llama | LlamaForCausalLM | 500000.0 | null | 8192 |
|
29 |
+
|
30 |
+
## Benchmarks
|
31 |
+
![](https://ibb.co.com/D83Mkyg)
|
32 |
+
|
33 |
+
## Specific Purpose Notes
|
34 |
+
# Original Model Card
|
35 |
+
|
36 |
+
---
|
37 |
+
library_name: transformers
|
38 |
+
license: llama2
|
39 |
+
datasets:
|
40 |
+
- aqua_rat
|
41 |
+
- microsoft/orca-math-word-problems-200k
|
42 |
+
- m-a-p/CodeFeedback-Filtered-Instruction
|
43 |
+
- anon8231489123/ShareGPT_Vicuna_unfiltered
|
44 |
+
---
|
45 |
+
|
46 |
+
# Llama-3-Smaug-8B
|
47 |
+
|
48 |
+
### Built with Meta Llama 3
|
49 |
+
|
50 |
+
|
51 |
+
![image/png](https://cdn-uploads.huggingface.co/production/uploads/64c14f95cac5f9ba52bbcd7f/OrcJyTaUtD2HxJOPPwNva.png)
|
52 |
+
|
53 |
+
This model was built using the Smaug recipe for improving performance on real world multi-turn conversations applied to
|
54 |
+
[meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct).
|
55 |
+
|
56 |
+
### Model Description
|
57 |
+
|
58 |
+
- **Developed by:** [Abacus.AI](https://abacus.ai)
|
59 |
+
- **License:** https://llama.meta.com/llama3/license/
|
60 |
+
- **Finetuned from model:** [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct).
|
61 |
+
|
62 |
+
|
63 |
+
## Evaluation
|
64 |
+
|
65 |
+
### MT-Bench
|
66 |
+
|
67 |
+
```
|
68 |
+
########## First turn ##########
|
69 |
+
score
|
70 |
+
model turn
|
71 |
+
Llama-3-Smaug-8B 1 8.77500
|
72 |
+
Meta-Llama-3-8B-Instruct 1 8.31250
|
73 |
+
########## Second turn ##########
|
74 |
+
score
|
75 |
+
model turn
|
76 |
+
Meta-Llama-3-8B-Instruct 2 7.8875
|
77 |
+
Llama-3-Smaug-8B 2 7.8875
|
78 |
+
########## Average ##########
|
79 |
+
score
|
80 |
+
model
|
81 |
+
Llama-3-Smaug-8B 8.331250
|
82 |
+
Meta-Llama-3-8B-Instruct 8.10
|
83 |
+
```
|
84 |
+
|
85 |
+
| Model | First turn | Second Turn | Average |
|
86 |
+
| :---- | ---------: | ----------: | ------: |
|
87 |
+
| Llama-3-Smaug-8B | 8.78 | 7.89 | 8.33 |
|
88 |
+
| Llama-3-8B-Instruct | 8.31 | 7.89 | 8.10 |
|
89 |
+
|
90 |
+
This version of Smaug uses new techniques and new data compared to [Smaug-72B](https://huggingface.co/abacusai/Smaug-72B-v0.1), and more information will be released later on. For now, see the previous Smaug paper: https://arxiv.org/abs/2402.13228.
|