Adding Evaluation Results

00940dd verified 3 months ago

5 kB

	---
	license: llama3.1
	library_name: transformers
	tags:
	- moe
	- frankenmoe
	- merge
	- mergekit
	base_model:
	- Joseph717171/Llama-3.1-SuperNova-8B-Lite_TIES_with_Base
	- ArliAI/Llama-3.1-8B-ArliAI-RPMax-v1.2
	model-index:
	- name: L3.1-Moe-2x8B-v0.2
	results:
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: IFEval (0-Shot)
	type: HuggingFaceH4/ifeval
	args:
	num_few_shot: 0
	metrics:
	- type: inst_level_strict_acc and prompt_level_strict_acc
	value: 73.48
	name: strict accuracy
	source:
	url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=moeru-ai/L3.1-Moe-2x8B-v0.2
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: BBH (3-Shot)
	type: BBH
	args:
	num_few_shot: 3
	metrics:
	- type: acc_norm
	value: 32.95
	name: normalized accuracy
	source:
	url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=moeru-ai/L3.1-Moe-2x8B-v0.2
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: MATH Lvl 5 (4-Shot)
	type: hendrycks/competition_math
	args:
	num_few_shot: 4
	metrics:
	- type: exact_match
	value: 15.26
	name: exact match
	source:
	url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=moeru-ai/L3.1-Moe-2x8B-v0.2
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: GPQA (0-shot)
	type: Idavidrein/gpqa
	args:
	num_few_shot: 0
	metrics:
	- type: acc_norm
	value: 6.71
	name: acc_norm
	source:
	url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=moeru-ai/L3.1-Moe-2x8B-v0.2
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: MuSR (0-shot)
	type: TAUR-Lab/MuSR
	args:
	num_few_shot: 0
	metrics:
	- type: acc_norm
	value: 11.38
	name: acc_norm
	source:
	url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=moeru-ai/L3.1-Moe-2x8B-v0.2
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: MMLU-PRO (5-shot)
	type: TIGER-Lab/MMLU-Pro
	config: main
	split: test
	args:
	num_few_shot: 5
	metrics:
	- type: acc
	value: 31.76
	name: accuracy
	source:
	url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=moeru-ai/L3.1-Moe-2x8B-v0.2
	name: Open LLM Leaderboard
	---

	# L3.1-Moe-2x8B-v0.2

	![cover](https://github.com/moeru-ai/L3.1-Moe/blob/main/cover/v0.2.png?raw=true)

	This model is a Mixture of Experts (MoE) made with mergekit-moe. It uses the following base models:

	- [Joseph717171/Llama-3.1-SuperNova-8B-Lite_TIES_with_Base](https://huggingface.co/Joseph717171/Llama-3.1-SuperNova-8B-Lite_TIES_with_Base)
	- [ArliAI/Llama-3.1-8B-ArliAI-RPMax-v1.2](https://huggingface.co/ArliAI/Llama-3.1-8B-ArliAI-RPMax-v1.2)

	Heavily inspired by [mlabonne/Beyonder-4x7B-v3](https://huggingface.co/mlabonne/Beyonder-4x7B-v3).

	## Quantized models

	### GGUF by [mradermacher](https://huggingface.co/mradermacher)

	- [mradermacher/L3.1-Moe-2x8B-v0.2-i1-GGUF](https://huggingface.co/mradermacher/L3.1-Moe-2x8B-v0.2-i1-GGUF)
	- [mradermacher/L3.1-Moe-2x8B-v0.2-GGUF](https://huggingface.co/mradermacher/L3.1-Moe-2x8B-v0.2-GGUF)

	## Mergekit config

	<details>
	<summary>mergekit_moe_config.yml</summary>

	```yaml
	base_model: Joseph717171/Llama-3.1-SuperNova-8B-Lite_TIES_with_Base
	gate_mode: hidden
	dtype: bfloat16
	experts:
	- source_model: Joseph717171/Llama-3.1-SuperNova-8B-Lite_TIES_with_Base
	positive_prompts: &common_prompts
	- "chat"
	- "assistant"
	- "tell me"
	- "explain"
	- "I want"
	- "code"
	- "python"
	- "javascript"
	- "programming"
	- "algorithm"
	- "reason"
	- "math"
	- "mathematics"
	- "solve"
	- "count"
	negative_prompts: &rp_prompts
	- "storywriting"
	- "write"
	- "scene"
	- "story"
	- "character"
	- source_model: ArliAI/Llama-3.1-8B-ArliAI-RPMax-v1.2
	positive_prompts: *rp_prompts
	negative_prompts: *common_prompts
	```

	</details>

	# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
	Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_moeru-ai__L3.1-Moe-2x8B-v0.2)

	\| Metric \|Value\|
	\|-------------------\|----:\|
	\|Avg. \|28.59\|
	\|IFEval (0-Shot) \|73.48\|
	\|BBH (3-Shot) \|32.95\|
	\|MATH Lvl 5 (4-Shot)\|15.26\|
	\|GPQA (0-shot) \| 6.71\|
	\|MuSR (0-shot) \|11.38\|
	\|MMLU-PRO (5-shot) \|31.76\|