Quantization made by Richard Erkhov.

Smaug-Llama-3-70B-Instruct-32K - GGUF

Model creator: https://huggingface.co/abacusai/
Original model: https://huggingface.co/abacusai/Smaug-Llama-3-70B-Instruct-32K/

Name	Quant method	Size
Smaug-Llama-3-70B-Instruct-32K.Q2_K.gguf	Q2_K	24.56GB
Smaug-Llama-3-70B-Instruct-32K.IQ3_XS.gguf	IQ3_XS	27.29GB
Smaug-Llama-3-70B-Instruct-32K.IQ3_S.gguf	IQ3_S	28.79GB
Smaug-Llama-3-70B-Instruct-32K.Q3_K_S.gguf	Q3_K_S	28.79GB
Smaug-Llama-3-70B-Instruct-32K.IQ3_M.gguf	IQ3_M	29.74GB
Smaug-Llama-3-70B-Instruct-32K.Q3_K.gguf	Q3_K	31.91GB
Smaug-Llama-3-70B-Instruct-32K.Q3_K_M.gguf	Q3_K_M	31.91GB
Smaug-Llama-3-70B-Instruct-32K.Q3_K_L.gguf	Q3_K_L	34.59GB
Smaug-Llama-3-70B-Instruct-32K.IQ4_XS.gguf	IQ4_XS	35.64GB
Smaug-Llama-3-70B-Instruct-32K.Q4_0.gguf	Q4_0	37.22GB
Smaug-Llama-3-70B-Instruct-32K.IQ4_NL.gguf	IQ4_NL	37.58GB
Smaug-Llama-3-70B-Instruct-32K.Q4_K_S.gguf	Q4_K_S	37.58GB
Smaug-Llama-3-70B-Instruct-32K.Q4_K.gguf	Q4_K	39.6GB
Smaug-Llama-3-70B-Instruct-32K.Q4_K_M.gguf	Q4_K_M	39.6GB
Smaug-Llama-3-70B-Instruct-32K.Q4_1.gguf	Q4_1	41.27GB
Smaug-Llama-3-70B-Instruct-32K.Q5_0.gguf	Q5_0	45.32GB
Smaug-Llama-3-70B-Instruct-32K.Q5_K_S.gguf	Q5_K_S	45.32GB
Smaug-Llama-3-70B-Instruct-32K.Q5_K.gguf	Q5_K	46.52GB
Smaug-Llama-3-70B-Instruct-32K.Q5_K_M.gguf	Q5_K_M	46.52GB
Smaug-Llama-3-70B-Instruct-32K.Q5_1.gguf	Q5_1	49.36GB
Smaug-Llama-3-70B-Instruct-32K.Q6_K.gguf	Q6_K	53.91GB
Smaug-Llama-3-70B-Instruct-32K.Q8_0.gguf	Q8_0	69.83GB

Original model description:

license: llama3 library_name: transformers datasets: - aqua_rat - microsoft/orca-math-word-problems-200k - m-a-p/CodeFeedback-Filtered-Instruction model-index: - name: Smaug-Llama-3-70B-Instruct-32K results: - task: type: text-generation name: Text Generation dataset: name: IFEval (0-Shot) type: HuggingFaceH4/ifeval args: num_few_shot: 0 metrics: - type: inst_level_strict_acc and prompt_level_strict_acc value: 77.61 name: strict accuracy source: url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=abacusai/Smaug-Llama-3-70B-Instruct-32K name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: BBH (3-Shot) type: BBH args: num_few_shot: 3 metrics: - type: acc_norm value: 49.07 name: normalized accuracy source: url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=abacusai/Smaug-Llama-3-70B-Instruct-32K name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: MATH Lvl 5 (4-Shot) type: hendrycks/competition_math args: num_few_shot: 4 metrics: - type: exact_match value: 21.22 name: exact match source: url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=abacusai/Smaug-Llama-3-70B-Instruct-32K name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: GPQA (0-shot) type: Idavidrein/gpqa args: num_few_shot: 0 metrics: - type: acc_norm value: 6.15 name: acc_norm source: url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=abacusai/Smaug-Llama-3-70B-Instruct-32K name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: MuSR (0-shot) type: TAUR-Lab/MuSR args: num_few_shot: 0 metrics: - type: acc_norm value: 12.43 name: acc_norm source: url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=abacusai/Smaug-Llama-3-70B-Instruct-32K name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: MMLU-PRO (5-shot) type: TIGER-Lab/MMLU-Pro config: main split: test args: num_few_shot: 5 metrics: - type: acc value: 41.83 name: accuracy source: url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=abacusai/Smaug-Llama-3-70B-Instruct-32K name: Open LLM Leaderboard

Smaug-Llama-3-70B-Instruct-32K

Built with Meta Llama 3

This is a 32K version of Smaug-Llama-3-70B-Instruct. It uses PoSE (https://arxiv.org/abs/2309.10400) and LoRA (https://arxiv.org/abs/2106.09685) adapter transfer. More details are coming soon.

Needle-In-A-Haystack (https://github.com/jzhang38/EasyContext) heatmap:

Model Description

Developed by: Abacus.AI
License: https://llama.meta.com/llama3/license/
Finetuned from model: meta-llama/Meta-Llama-3-70B-Instruct.

How to use

The prompt format is unchanged from Llama 3 70B Instruct.

Use with transformers

See the snippet below for usage with Transformers:

import transformers
import torch

model_id = "abacusai/Smaug-Llama-3-70B-Instruct"

pipeline = transformers.pipeline(
    "text-generation",
    model=model_id,
    model_kwargs={"torch_dtype": torch.bfloat16},
    device_map="auto",
)

messages = [
    {"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
    {"role": "user", "content": "Who are you?"},
]

prompt = pipeline.tokenizer.apply_chat_template(
        messages, 
        tokenize=False, 
        add_generation_prompt=True
)

terminators = [
    pipeline.tokenizer.eos_token_id,
    pipeline.tokenizer.convert_tokens_to_ids("<|eot_id|>")
]

outputs = pipeline(
    prompt,
    max_new_tokens=256,
    eos_token_id=terminators,
    do_sample=True,
    temperature=0.6,
    top_p=0.9,
)
print(outputs[0]["generated_text"][len(prompt):])

Evaluation

Arena-Hard

Score vs selected others (sourced from: (https://lmsys.org/blog/2024-04-19-arena-hard/#full-leaderboard-with-gpt-4-turbo-as-judge)). GPT-4o and Gemini-1.5-pro-latest were missing from the original blob post, and we produced those numbers from a local run using the same methodology.

Model	Score	95% Confidence Interval	Average Tokens
GPT-4-Turbo-2024-04-09	82.6	(-1.8, 1.6)	662
GPT-4o	78.3	(-2.4, 2.1)	685
Gemini-1.5-pro-latest	72.1	(-2.3, 2.2)	630
Claude-3-Opus-20240229	60.4	(-3.3, 2.4)	541
Smaug-Llama-3-70B-Instruct-32K	60.0	(-2.6, 2.1)	844
Smaug-Llama-3-70B-Instruct	56.7	(-2.2, 2.6)	661
GPT-4-0314	50.0	(-0.0, 0.0)	423
Claude-3-Sonnet-20240229	46.8	(-2.1, 2.2)	552
Llama-3-70B-Instruct	41.1	(-2.5, 2.4)	583
GPT-4-0613	37.9	(-2.2, 2.0)	354
Mistral-Large-2402	37.7	(-1.9, 2.6)	400
Mixtral-8x22B-Instruct-v0.1	36.4	(-2.7, 2.9)	430
Qwen1.5-72B-Chat	36.1	(-2.5, 2.2)	474
Command-R-Plus	33.1	(-2.1, 2.2)	541
Mistral-Medium	31.9	(-2.3, 2.4)	485
GPT-3.5-Turbo-0613	24.8	(-1.6, 2.0)	401

Note that we believe the number of tokens/verbosity of the model strongly influences the GPT-4 judge in this case, and at least partially explains the improvement in Arena-Hard score for the 32K model.

OpenLLM Leaderboard Manual Evaluation

Model	ARC	Hellaswag	MMLU	TruthfulQA	Winogrande	GSM8K*	Average
Smaug-Llama-3-70B-Instruct-32K	70.1	TBA	TBA	61.9	82.2	TBA	TBA
Llama-3-70B-Instruct	71.4	85.7	80.0	61.8	82.9	91.1	78.8

GSM8K The GSM8K numbers quoted here are computed using a recent release of the LM Evaluation Harness. The commit used by the leaderboard has a significant issue that impacts models that tend to use : in their responses due to a bug in the stop word configuration for GSM8K. The issue is covered in more detail in this GSM8K evaluation discussion. The score for both Llama-3 and this model are significantly different when evaluated with the updated harness as the issue with stop words has been addressed.

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric	Value
Avg.	34.72
IFEval (0-Shot)	77.61
BBH (3-Shot)	49.07
MATH Lvl 5 (4-Shot)	21.22
GPQA (0-shot)	6.15
MuSR (0-shot)	12.43
MMLU-PRO (5-shot)	41.83