omarelshehy
/

OpenHermes-2.5-Mistral-7B-medusa-base

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

OpenHermes-2.5-Mistral-7B-medusa-base / README.md

omarelshehy's picture

Update README.md

7f3a812 verified 4 months ago

|

history blame contribute delete

1.35 kB

	---
	license: apache-2.0
	datasets:
	- teknium/OpenHermes-2.5
	---
	This is a finetuned base model from [OpenHermes-2.5](https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B) for the trained medusa head [OpenHermes-2.5-medusa](https://huggingface.co/omarelshehy/OpenHermes-2.5-Mistral-7B-medusa)

	The base model and the medusa heads were trained together, therefore ideally should be used together for the best performance.

	WIP: Replace the model with an adapter to the original model

	# Training Details

	The model and the heads were trained using a self-distilled dataset inferred from the original dataset used for training https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B

	The inference on the dataset was done using [vLLM](https://docs.vllm.ai/en/latest/index.html) async server on a A100.

	The training was performed with the help of [Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl) on a single A100 GPU using qLora for 2 epochs

	# Inference evaluation
	(This is still a WIP)
	I tested the model's latency performance using [TGI](https://huggingface.co/docs/text-generation-inference/en/index). As reported by several people the model's performance depends on the domain or task. Generally speaking however i measured 1.9x improvement in latency. With code related tasks however, the latency can reach 3x improvement.