omarelshehy
commited on
Commit
•
fbe6046
1
Parent(s):
1f2f0b3
Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,24 @@
|
|
1 |
-
---
|
2 |
-
license: apache-2.0
|
3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: apache-2.0
|
3 |
+
datasets:
|
4 |
+
- teknium/OpenHermes-2.5
|
5 |
+
---
|
6 |
+
This is a finetuned base model from [OpenHermes-2.5](https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B) for the trained medusa head [OpenHermes-2.5-medusa](omarelshehy/OpenHermes-2.5-Mistral-7B-medusa)
|
7 |
+
|
8 |
+
The base model and the medusa heads were trained together, therefore ideally should be used together for the best performance.
|
9 |
+
|
10 |
+
WIP: Replace the model with an adapter to the original model
|
11 |
+
|
12 |
+
# Training Details
|
13 |
+
|
14 |
+
The model and the heads were trained using a self-distilled dataset inferred from the original dataset used for training https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B
|
15 |
+
|
16 |
+
The inference on the dataset was done using [vLLM](https://docs.vllm.ai/en/latest/index.html) async server on a A100.
|
17 |
+
|
18 |
+
The training was performed with the help of [Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl) on a single A100 GPU using qLora for 2 epochs
|
19 |
+
|
20 |
+
# Inference evaluation
|
21 |
+
(This is still a WIP)
|
22 |
+
I tested the model's latency performance using [TGI](https://huggingface.co/docs/text-generation-inference/en/index). As reported by several people the model's performance depends on the domain or task. Generally speaking however i measured 1.9x improvement in latency. With code related tasks however, the latency can reach 3x improvement.
|
23 |
+
|
24 |
+
|