Aura-8B-EXL2 / README.md

Update README.md

34445e8 verified 13 days ago

6.61 kB

	---
	license: apache-2.0
	language:
	- en
	datasets:
	- FourOhFour/RP_Phase
	- Nitral-AI/Cybersecurity-ShareGPT
	- Nitral-AI/Medical_Instruct-ShareGPT
	- Nitral-AI/Olympiad_Math-ShareGPT
	- NewEden/Claude-Instruct-5K
	- lodrick-the-lafted/kalo-opus-instruct-3k-filtered
	- Nitral-AI/Creative_Writing-ShareGPT
	- jeiku/Writing
	- anthracite-core/full-opus-chosen-hermes-rejected-kto-v1
	base_model:
	- arcee-ai/Llama-3.1-SuperNova-Lite
	---
	---
	### These are EXL2 quants for Aura-8B, Measurement file in the main branch, Check revisions for different BPW
	---
	## Aura-8B

	![image/png](https://cdn-uploads.huggingface.co/production/uploads/626dfb8786671a29c715f8a9/9y03nVWVnBYU1tHkLwCJy.png)

	## Introduction

	Aura-8B is a state of the art dedicated roleplaying model designed to fulfill your every desire.

	This finetune has seen several hundreds of millions of tokens of instruction and roleplaying data. A Kahneman-Tversky Optimization was applied as a Low Rank Adapter to give this model a unique output style.

	Developed by Aura Industries, with contributions from Anthracite Org

	## Model Details

	- Model Name: Aura-8B
	- Base Model: [arcee-ai/Llama-3.1-SuperNova-Lite](https://huggingface.co/arcee-ai/Llama-3.1-SuperNova-Lite)
	- Model Type: Chat Completions
	- Prompt Format: Llama 3
	- License: Apache-2.0
	- Language: English
	- Max Context: 8,192+ tokens

	## License

	This model is licensed under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0).

	## Quantizations

	[Static GGUF](https://huggingface.co/mradermacher/Aura-8B-GGUF)

	[Imatrix GGUF](https://huggingface.co/mradermacher/Aura-8B-i1-GGUF)

	[EXL2](https://huggingface.co/NewEden/Aura-8B-EXL2)

	# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)

	\| Metric \|Value\|
	\|-------------------\|----:\|
	\|Avg. \|27.34\|
	\|IFEval (0-Shot) \|72.05\|
	\|BBH (3-Shot) \|30.98\|
	\|MATH Lvl 5 (4-Shot)\|15.03\|
	\|GPQA (0-shot) \| 4.81\|
	\|MuSR (0-shot) \| 9.22\|
	\|MMLU-PRO (5-shot) \|31.93\|

	## Training Configuration

	<details><summary>Click here for Axolotl configs</summary>

	SFT
	```yaml
	base_model: arcee-ai/Llama-3.1-SuperNova-Lite
	model_type: AutoModelForCausalLM
	tokenizer_type: AutoTokenizer

	load_in_8bit: false
	load_in_4bit: false
	strict: false

	datasets:
	- path: FourOhFour/RP_Phase
	type: chat_template
	chat_template: llama3
	roles_to_train: ["gpt"]
	field_messages: conversations
	message_field_role: from
	message_field_content: value
	train_on_eos: turn
	- path: Nitral-AI/Cybersecurity-ShareGPT
	type: chat_template
	chat_template: llama3
	roles_to_train: ["gpt"]
	field_messages: conversations
	message_field_role: from
	message_field_content: value
	train_on_eos: turn
	- path: Nitral-AI/Medical_Instruct-ShareGPT
	type: chat_template
	chat_template: llama3
	roles_to_train: ["gpt"]
	field_messages: conversations
	message_field_role: from
	message_field_content: value
	train_on_eos: turn
	- path: Nitral-AI/Olympiad_Math-ShareGPT
	type: chat_template
	chat_template: llama3
	roles_to_train: ["gpt"]
	field_messages: conversations
	message_field_role: from
	message_field_content: value
	train_on_eos: turn
	- path: NewEden/Claude-Instruct-5k
	type: chat_template
	chat_template: llama3
	roles_to_train: ["gpt"]
	field_messages: conversations
	message_field_role: from
	message_field_content: value
	train_on_eos: turn
	- path: lodrick-the-lafted/kalo-opus-instruct-3k-filtered
	type: chat_template
	chat_template: llama3
	roles_to_train: ["gpt"]
	field_messages: conversations
	message_field_role: from
	message_field_content: value
	train_on_eos: turn
	- path: Nitral-AI/Creative_Writing-ShareGPT
	type: chat_template
	chat_template: llama3
	roles_to_train: ["gpt"]
	field_messages: conversations
	message_field_role: from
	message_field_content: value
	train_on_eos: turn
	- path: jeiku/Writing
	type: completion
	field: text

	shuffle_merged_datasets: true
	dataset_prepared_path:
	val_set_size: 0.01
	output_dir: ./output/out

	hub_model_id: jeiku/Aura-8B
	hub_strategy: "all_checkpoints"
	push_dataset_to_hub:
	hf_use_auth_token: true

	sequence_len: 8192
	sample_packing: true
	eval_sample_packing: false
	pad_to_sequence_len:

	wandb_project: Aura-8B
	wandb_entity:
	wandb_watch:
	wandb_name: Aura-8B
	wandb_log_model:

	gradient_accumulation_steps: 16
	micro_batch_size: 2
	num_epochs: 2
	optimizer: paged_adamw_8bit
	lr_scheduler: cosine
	learning_rate: 1e-5

	train_on_inputs: false
	group_by_length: false
	bf16: auto
	fp16:
	tf32: false

	gradient_checkpointing: true
	early_stopping_patience:
	resume_from_checkpoint:
	local_rank:
	logging_steps: 1
	xformers_attention:
	flash_attention: true

	warmup_ratio: 0.1
	evals_per_epoch: 2
	eval_table_size:
	eval_max_new_tokens:
	saves_per_epoch: 1
	debug:
	deepspeed:
	weight_decay: 0.05
	fsdp:
	fsdp_config:
	special_tokens:
	pad_token: <\|finetune_right_pad_id\|>
	eos_token: <\|eot_id\|>
	```

	KTO
	```yaml
	base_model: jeiku/Aura-8B
	model_type: AutoModelForCausalLM
	tokenizer_type: AutoTokenizer

	load_in_8bit: false
	load_in_4bit: false
	strict: false

	hub_model_id: jeiku/aurakto
	hub_strategy: "all_checkpoints"
	push_dataset_to_hub:
	hf_use_auth_token: true

	chat_template: llama3

	rl: kto
	rl_beta: 0.2
	kto_desirable_weight: 0.2

	datasets:
	- path: anthracite-core/full-opus-chosen-hermes-rejected-kto-v1
	type: llama3.argilla

	shuffle_merged_datasets: true
	val_set_size: 0.0
	output_dir: ./outputs/out

	adapter: lora
	lora_model_dir:

	lora_r: 32
	lora_alpha: 64
	lora_dropout: 0.05
	lora_target_linear: true
	lora_fan_in_fan_out:

	sequence_len: 8192
	sample_packing: false
	eval_sample_packing: false
	pad_to_sequence_len: false

	wandb_project: Aura-8B
	wandb_entity:
	wandb_watch:
	wandb_name: Aura-8B
	wandb_log_model:

	gradient_accumulation_steps: 16
	micro_batch_size: 2
	num_epochs: 2
	max_steps: 500

	optimizer: adamw_8bit
	lr_scheduler: cosine
	learning_rate: 0.0001
	weight_decay: 0.05

	train_on_inputs: false
	group_by_length: false
	bf16: auto
	fp16:
	tf32: true

	gradient_checkpointing: true
	gradient_checkpointing_kwargs:
	use_reentrant: true
	remove_unused_columns: false
	early_stopping_patience:
	resume_from_checkpoint:
	local_rank:
	logging_steps: 1
	xformers_attention:
	flash_attention: true

	warmup_steps: 10
	evals_per_epoch: 2
	eval_table_size:
	eval_max_new_tokens:
	saves_per_epoch: 1

	debug:
	deepspeed:
	fsdp:
	fsdp_config:
	fsdp:
	fsdp_config:

	special_tokens:
	pad_token: <\|finetune_right_pad_id\|>
	eos_token: <\|eot_id\|>
	```
	</details><br>