license: apache-2.0
datasets:
- teknium/OpenHermes-2.5
This is a finetuned base model from OpenHermes-2.5 for the trained medusa head OpenHermes-2.5-medusa
The base model and the medusa heads were trained together, therefore ideally should be used together for the best performance.
WIP: Replace the model with an adapter to the original model
Training Details
The model and the heads were trained using a self-distilled dataset inferred from the original dataset used for training https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B
The inference on the dataset was done using vLLM async server on a A100.
The training was performed with the help of Axolotl on a single A100 GPU using qLora for 2 epochs
Inference evaluation
(This is still a WIP) I tested the model's latency performance using TGI. As reported by several people the model's performance depends on the domain or task. Generally speaking however i measured 1.9x improvement in latency. With code related tasks however, the latency can reach 3x improvement.