Edit model card

This is a finetuned base model from OpenHermes-2.5 for the trained medusa head OpenHermes-2.5-medusa

The base model and the medusa heads were trained together, therefore ideally should be used together for the best performance.

WIP: Replace the model with an adapter to the original model

Training Details

The model and the heads were trained using a self-distilled dataset inferred from the original dataset used for training https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B

The inference on the dataset was done using vLLM async server on a A100.

The training was performed with the help of Axolotl on a single A100 GPU using qLora for 2 epochs

Inference evaluation

(This is still a WIP) I tested the model's latency performance using TGI. As reported by several people the model's performance depends on the domain or task. Generally speaking however i measured 1.9x improvement in latency. With code related tasks however, the latency can reach 3x improvement.

Downloads last month
12
Safetensors
Model size
7.24B params
Tensor type
BF16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for omarelshehy/OpenHermes-2.5-Mistral-7B-medusa-base

Quantizations
1 model

Dataset used to train omarelshehy/OpenHermes-2.5-Mistral-7B-medusa-base