Llama-Spark / README.md

Adding Evaluation Results (#3)

b491aa4 verified 6 months ago

5.05 kB

	---
	license: llama3.1
	model-index:
	- name: Llama-Spark
	results:
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: IFEval (0-Shot)
	type: HuggingFaceH4/ifeval
	args:
	num_few_shot: 0
	metrics:
	- type: inst_level_strict_acc and prompt_level_strict_acc
	value: 79.11
	name: strict accuracy
	source:
	url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=arcee-ai/Llama-Spark
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: BBH (3-Shot)
	type: BBH
	args:
	num_few_shot: 3
	metrics:
	- type: acc_norm
	value: 29.77
	name: normalized accuracy
	source:
	url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=arcee-ai/Llama-Spark
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: MATH Lvl 5 (4-Shot)
	type: hendrycks/competition_math
	args:
	num_few_shot: 4
	metrics:
	- type: exact_match
	value: 1.06
	name: exact match
	source:
	url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=arcee-ai/Llama-Spark
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: GPQA (0-shot)
	type: Idavidrein/gpqa
	args:
	num_few_shot: 0
	metrics:
	- type: acc_norm
	value: 6.6
	name: acc_norm
	source:
	url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=arcee-ai/Llama-Spark
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: MuSR (0-shot)
	type: TAUR-Lab/MuSR
	args:
	num_few_shot: 0
	metrics:
	- type: acc_norm
	value: 2.62
	name: acc_norm
	source:
	url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=arcee-ai/Llama-Spark
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: MMLU-PRO (5-shot)
	type: TIGER-Lab/MMLU-Pro
	config: main
	split: test
	args:
	num_few_shot: 5
	metrics:
	- type: acc
	value: 30.23
	name: accuracy
	source:
	url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=arcee-ai/Llama-Spark
	name: Open LLM Leaderboard
	---
	<div align="center">
	<img src="https://i.ibb.co/9hwFrvL/BLMs-Wkx-NQf-W-46-FZDg-ILhg.jpg" alt="Arcee Spark" style="border-radius: 10px; box-shadow: 0 4px 8px 0 rgba(0, 0, 0, 0.2), 0 6px 20px 0 rgba(0, 0, 0, 0.19); max-width: 100%; height: auto;">
	</div>


	Llama-Spark is a powerful conversational AI model developed by Arcee.ai. It's built on the foundation of Llama-3.1-8B and merges the power of our Tome Dataset with Llama-3.1-8B-Instruct, resulting in a remarkable conversationalist that punches well above its 8B parameter weight class.

	## GGUFs available [here](https://huggingface.co/arcee-ai/Llama-Spark-GGUF)

	## Model Description

	Llama-Spark is our commitment to consistently delivering the best-performing conversational AI in the 6-9B parameter range. As new base models become available, we'll continue to update and improve Spark to maintain its leadership position.

	This model is a successor to our original Arcee-Spark, incorporating advancements and learnings from our ongoing research and development.

	## Intended Uses

	Llama-Spark is intended for use in conversational AI applications, such as chatbots, virtual assistants, and dialogue systems. It excels at engaging in natural and informative conversations.

	## Training Information

	Llama-Spark is built upon the Llama-3.1-8B base model, fine-tuned using of the Tome Dataset and merged with Llama-3.1-8B-Instruct.
	## Evaluation Results
	Please note that these scores are consistantly higher than the OpenLLM leaderboard, and should be compared to their relative performance increase not weighed against the leaderboard.
	<div align="center">
	<img src="https://i.ibb.co/pfSGLtB/Screenshot-2024-08-01-at-11-40-42-PM.png" alt="Arcee Spark" style="border-radius: 10px; box-shadow: 0 4px 8px 0 rgba(0, 0, 0, 0.2), 0 6px 20px 0 rgba(0, 0, 0, 0.19); max-width: 100%; height: auto;">
	</div>

	## Acknowledgements

	We extend our deepest gratitude to PrimeIntellect for being our compute sponsor for this project.

	# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
	Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_arcee-ai__Llama-Spark)

	\| Metric \|Value\|
	\|-------------------\|----:\|
	\|Avg. \|24.90\|
	\|IFEval (0-Shot) \|79.11\|
	\|BBH (3-Shot) \|29.77\|
	\|MATH Lvl 5 (4-Shot)\| 1.06\|
	\|GPQA (0-shot) \| 6.60\|
	\|MuSR (0-shot) \| 2.62\|
	\|MMLU-PRO (5-shot) \|30.23\|