README.md · tinybiggames/Hermes-2-Pro-Llama-3-8B-Q4_K

Hermes-2-Pro-Llama-3-8B-Q4_K_M-GGUF / README.md

tinybiggames

Update README.md

7ef74a1 verified 7 months ago

preview code

raw

history blame contribute delete

2.37 kB

	---
	language:
	- en
	tags:
	- Llama-3
	- instruct
	- finetune
	- chatml
	- DPO
	- RLHF
	- gpt4
	- synthetic data
	- distillation
	- function calling
	- json mode
	- axolotl
	- llama-cpp
	- gguf-my-repo
	base_model: NousResearch/Meta-Llama-3-8B
	datasets:
	- teknium/OpenHermes-2.5
	widget:
	- example_title: Hermes 2 Pro
	messages:
	- role: system
	content: >-
	You are a sentient, superintelligent artificial general intelligence, here
	to teach and assist me.
	- role: user
	content: >-
	Write a short story about Goku discovering kirby has teamed up with Majin
	Buu to destroy the world.
	model-index:
	- name: Hermes-2-Pro-Llama-3-8B
	results: []
	---

	# tinybiggames/Hermes-2-Pro-Llama-3-8B-Q4_K_M-GGUF
	This model was converted to GGUF format from [`NousResearch/Hermes-2-Pro-Llama-3-8B`](https://huggingface.co/NousResearch/Hermes-2-Pro-Llama-3-8B) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
	Refer to the [original model card](https://huggingface.co/NousResearch/Hermes-2-Pro-Llama-3-8B) for more details on the model.
	## Use with tinyBigGAMES's [Inference](https://github.com/tinyBigGAMES) Libraries.


	How to configure LMEngine:

	```Delphi
	InitConfig(
	'C:/LLM/gguf', // path to model files
	-1 // number of GPU layer, -1 to use all available layers
	);
	```

	How to define model:

	```Delphi
	DefineModel('hermes-2-pro-llama-3-8b.Q4_K_M.gguf',
	'hermes-2-pro-llama-3-8b.Q4_K_M', 8000, '<\|im_start\|>{role}\n{content}<\|im_end\|>\n',
	'<\|im_start\|>assistant');
	```

	How to add a message:

	```Delphi
	AddMessage(
	ROLE_USER, // role
	'What is AI?' // content
	);
	```

	`{role}` - will be substituted with the message "role"
	`{content}` - will be substituted with the message "content"

	How to do inference:

	```Delphi
	var
	LTokenOutputSpeed: Single;
	LInputTokens: Int32;
	LOutputTokens: Int32;
	LTotalTokens: Int32;

	if RunInference('hermes-2-pro-llama-3-8b.Q4_K_M', 1024) then
	begin
	GetInferenceStats(nil, @LTokenOutputSpeed, @LInputTokens, @LOutputTokens,
	@LTotalTokens);
	PrintLn('', FG_WHITE);
	PrintLn('Tokens :: Input: %d, Output: %d, Total: %d, Speed: %3.1f t/s',
	FG_BRIGHTYELLOW, LInputTokens, LOutputTokens, LTotalTokens, LTokenOutputSpeed);
	end
	else
	begin
	PrintLn('', FG_WHITE);
	PrintLn('Error: %s', FG_RED, GetError());
	end;
	```