Adding Evaluation Results (#1)

7bec0a6 verified 5 months ago

5.6 kB

	---
	license: llama3.1
	datasets:
	- agentlans/crash-course
	base_model:
	- agentlans/Llama3.1-SuperDeepFuse
	model-index:
	- name: Llama3.1-SuperDeepFuse-CrashCourse12K
	results:
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: IFEval (0-Shot)
	type: wis-k/instruction-following-eval
	split: train
	args:
	num_few_shot: 0
	metrics:
	- type: inst_level_strict_acc and prompt_level_strict_acc
	value: 71.87
	name: averaged accuracy
	source:
	url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=agentlans%2FLlama3.1-SuperDeepFuse-CrashCourse12K
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: BBH (3-Shot)
	type: SaylorTwift/bbh
	split: test
	args:
	num_few_shot: 3
	metrics:
	- type: acc_norm
	value: 31.83
	name: normalized accuracy
	source:
	url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=agentlans%2FLlama3.1-SuperDeepFuse-CrashCourse12K
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: MATH Lvl 5 (4-Shot)
	type: lighteval/MATH-Hard
	split: test
	args:
	num_few_shot: 4
	metrics:
	- type: exact_match
	value: 17.67
	name: exact match
	source:
	url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=agentlans%2FLlama3.1-SuperDeepFuse-CrashCourse12K
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: GPQA (0-shot)
	type: Idavidrein/gpqa
	split: train
	args:
	num_few_shot: 0
	metrics:
	- type: acc_norm
	value: 8.39
	name: acc_norm
	source:
	url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=agentlans%2FLlama3.1-SuperDeepFuse-CrashCourse12K
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: MuSR (0-shot)
	type: TAUR-Lab/MuSR
	args:
	num_few_shot: 0
	metrics:
	- type: acc_norm
	value: 8.6
	name: acc_norm
	source:
	url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=agentlans%2FLlama3.1-SuperDeepFuse-CrashCourse12K
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: MMLU-PRO (5-shot)
	type: TIGER-Lab/MMLU-Pro
	config: main
	split: test
	args:
	num_few_shot: 5
	metrics:
	- type: acc
	value: 29.24
	name: accuracy
	source:
	url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=agentlans%2FLlama3.1-SuperDeepFuse-CrashCourse12K
	name: Open LLM Leaderboard
	---
	# Llama3.1-SuperDeepFuse-CrashCourse12K

	Llama3.1-SuperDeepFuse-CrashCourse12K is an 8B parameter language model based on [Llama3.1-SuperDeepFuse](https://huggingface.co/agentlans/Llama3.1-SuperDeepFuse)
	and further fine-tuned on [agentlans/crash-course](https://huggingface.co/datasets/agentlans/crash-course).

	## Model Details

	- Base Model: Llama3.1-SuperDeepFuse (8B parameters)
	- Fine-tuning Dataset: 12 000 samples from agentlans/crash-course (containing samples from 10 high-quality instruct datasets)
	- Model Type: Instruction-tuned language model
	- Language(s): Multilingual
	- License: Follows standard Llama 3.1 usage terms

	## Training Procedure

	### Fine-tuning

	- Method: LoRA (Low-Rank Adaptation)
	- Optimizer: AdamW
	- Learning Rate: 5e-5
	- Batch Size: 2 per device
	- Gradient Accumulation Steps: 8
	- Training Epochs: 1
	- Max Sequence Length: 2048
	- LoRA Configuration:
	- Rank: 8
	- Alpha: 16
	- Dropout: 0.5
	- Target: all layers
	- Quantization: 4-bit (bitsandbytes)
	- Precision: BF16
	- Other Techniques: NEFTune (noise alpha: 5), RS-LoRA

	## Performance and Limitations

	This model potentially offers:

	- Enhanced multi-task reasoning
	- Improved performance in mathematics and coding tasks
	- Better instruction-following abilities

	However:

	- Performance may be limited compared to larger model variants
	- Can produce misleading or incorrect outputs
	- Outputs should be independently verified for critical applications

	## Additional Information

	- For the original model, see [agentlans/Llama3.1-SuperDeepFuse](https://huggingface.co/agentlans/Llama3.1-SuperDeepFuse)
	- For the base Llama 3.1 model, including training data and model architecture, refer to the original [Llama 3.1](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) model card.
	# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
	Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/agentlans__Llama3.1-SuperDeepFuse-CrashCourse12K-details)!
	Summarized results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/contents/viewer/default/train?q=agentlans%2FLlama3.1-SuperDeepFuse-CrashCourse12K&sort[column]=Average%20%E2%AC%86%EF%B8%8F&sort[direction]=desc)!

	\| Metric \|Value (%)\|
	\|-------------------\|--------:\|
	\|Average \| 27.93\|
	\|IFEval (0-Shot) \| 71.87\|
	\|BBH (3-Shot) \| 31.83\|
	\|MATH Lvl 5 (4-Shot)\| 17.67\|
	\|GPQA (0-shot) \| 8.39\|
	\|MuSR (0-shot) \| 8.60\|
	\|MMLU-PRO (5-shot) \| 29.24\|