Quantization made by Richard Erkhov.
Llama3-8B-SuperNova-Spectrum-Hermes-DPO - GGUF
- Model creator: https://huggingface.co/yuvraj17/
- Original model: https://huggingface.co/yuvraj17/Llama3-8B-SuperNova-Spectrum-Hermes-DPO/
Original model description:
language: - en license: apache-2.0 library_name: transformers tags: - dpo - rlhf - trl pipeline_tag: text-generation model-index: - name: Llama3-8B-SuperNova-Spectrum-Hermes-DPO results: - task: type: text-generation name: Text Generation dataset: name: IFEval (0-Shot) type: HuggingFaceH4/ifeval args: num_few_shot: 0 metrics: - type: inst_level_strict_acc and prompt_level_strict_acc value: 46.91 name: strict accuracy source: url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=yuvraj17/Llama3-8B-SuperNova-Spectrum-Hermes-DPO name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: BBH (3-Shot) type: BBH args: num_few_shot: 3 metrics: - type: acc_norm value: 21.24 name: normalized accuracy source: url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=yuvraj17/Llama3-8B-SuperNova-Spectrum-Hermes-DPO name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: MATH Lvl 5 (4-Shot) type: hendrycks/competition_math args: num_few_shot: 4 metrics: - type: exact_match value: 5.14 name: exact match source: url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=yuvraj17/Llama3-8B-SuperNova-Spectrum-Hermes-DPO name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: GPQA (0-shot) type: Idavidrein/gpqa args: num_few_shot: 0 metrics: - type: acc_norm value: 6.94 name: acc_norm source: url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=yuvraj17/Llama3-8B-SuperNova-Spectrum-Hermes-DPO name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: MuSR (0-shot) type: TAUR-Lab/MuSR args: num_few_shot: 0 metrics: - type: acc_norm value: 9.62 name: acc_norm source: url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=yuvraj17/Llama3-8B-SuperNova-Spectrum-Hermes-DPO name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: MMLU-PRO (5-shot) type: TIGER-Lab/MMLU-Pro config: main split: test args: num_few_shot: 5 metrics: - type: acc value: 18.16 name: accuracy source: url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=yuvraj17/Llama3-8B-SuperNova-Spectrum-Hermes-DPO name: Open LLM Leaderboard
Llama3-8B-SuperNova-Spectrum-Hermes-DPO
This model is a DPO fine-tuned version of my DARE_TIES
merged Model yuvraj17/Llama3-8B-SuperNova-Spectrum-dare_ties
on the yuvraj17/chatml-OpenHermes2.5-dpo-binarized-alpha-2k dataset.
DPO (Direct Preference Optimization):
Direct Preference Optimization (DPO) is a fine-tuning technique that focuses on aligning a model's responses with human preferences or ranking data without requiring reinforcement learning steps, like in RLHF.
Training:
- Trained on 1x A40s (48GB VRAM) using the HuggingFace TRL.
- QLoRA(
4-bit precision
) for 1 epoch# LoRA configuration peft_config = LoraConfig( r=32, lora_alpha=16, lora_dropout=0.05, bias="none", task_type="CAUSAL_LM", target_modules=['k_proj', 'gate_proj', 'v_proj', 'up_proj', 'q_proj', 'o_proj', 'down_proj'] )
Training Params
The following hyperparameters were used during training:
- learning_rate: 5e-05
- beta=0.1
- num_devices: 1
- gradient_accumulation_steps: 4
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 100
- num_epochs: 1
Training Time = 1:57:00 hours
Weight & Biases Report
π» Usage
!pip install -qU transformers accelerate
from transformers import AutoTokenizer
import transformers
import torch
model = "yuvraj17/Llama3-8B-SuperNova-Spectrum-Hermes-DPO"
messages = [{"role": "user", "content": "What is a large language model?"}]
tokenizer = AutoTokenizer.from_pretrained(model)
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
pipeline = transformers.pipeline(
"text-generation",
model=model,
torch_dtype=torch.float16,
device_map="auto",
)
outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])
π Evaluation Scores
Coming Soon
Open LLM Leaderboard Evaluation Results
Detailed results can be found here
Metric | Value |
---|---|
Avg. | 18.00 |
IFEval (0-Shot) | 46.91 |
BBH (3-Shot) | 21.24 |
MATH Lvl 5 (4-Shot) | 5.14 |
GPQA (0-shot) | 6.94 |
MuSR (0-shot) | 9.62 |
MMLU-PRO (5-shot) | 18.16 |