Triangle104
/

Llama-3.1-Tulu-3-8B-Q5_K_M-GGUF

@@ -19,22 +19,88 @@ Refer to the [original model card](https://huggingface.co/allenai/Llama-3.1-Tulu
 ---
 Model details:
 -
 The chat template for our models is formatted as:
-<|user|>\nHow are you doing?\n<|assistant|>\nI'm just a computer program, so I don't have feelings, but I'm functioning as expected. How can I assist you today?<|endoftext|>
 Or with new lines expanded:
 <|user|>
 How are you doing?
 <|assistant|>
-I'm just a computer program, so I don't have feelings, but I'm functioning as expected. How can I assist you today?<|endoftext|>
 It is embedded within the tokenizer as well, for tokenizer.apply_chat_template.
-		System prompt
@@ -43,10 +109,11 @@ In Ai2 demos, we use this system prompt by default:
 You are Tulu 3, a helpful and harmless AI Assistant built by the Allen Institute for AI.
 The model has not been trained with a specific system prompt in mind.
-		Bias, Risks, and Limitations
@@ -59,10 +126,13 @@ to train the base Llama 3.1 models, however it is likely to have
 included a mix of Web data and technical sources like books and code.
 See the Falcon 180B model card for an example of this.
 Hyperparamters
 PPO settings for RLVR:
 Learning Rate: 3 × 10⁻⁷
 Discount Factor (gamma): 1.0
 General Advantage Estimation (lambda): 0.95
@@ -82,8 +152,8 @@ Total Episodes: 100,000
 KL penalty coefficient (beta): [0.1, 0.05, 0.03, 0.01]
 Warm up ratio (omega): 0.0
-		License and use
@@ -97,8 +167,8 @@ The models have been fine-tuned using a dataset mix with outputs
 generated from third party models and are subject to additional terms:
 Gemma Terms of Use and Qwen License Agreement (models were improved using Qwen 2.5).
-		Citation

 ---
 Model details:
 -
+Tülu3 is a leading instruction following model family, offering fully
+ open-source data, code, and recipes designed to serve as a
+comprehensive guide for modern post-training techniques.
+Tülu3 is designed for state-of-the-art performance on a diversity of
+tasks in addition to chat, such as MATH, GSM8K, and IFEval.
+    Model description
+Model type: A model trained on a mix of publicly available, synthetic and human-created datasets.
+Language(s) (NLP): Primarily English
+License: Llama 3.1 Community License Agreement
+Finetuned from model: allenai/Llama-3.1-Tulu-3-8B-DPO
+    Model Sources
+Training Repository: https://github.com/allenai/open-instruct
+Eval Repository: https://github.com/allenai/olmes
+Paper: https://arxiv.org/abs/2411.15124
+Demo: https://playground.allenai.org/
+Using the model
+    Loading with HuggingFace
+To load the model with HuggingFace, use the following snippet:
+from transformers import AutoModelForCausalLM
+tulu_model = AutoModelForCausalLM.from_pretrained("allenai/Llama-3.1-Tulu-3-8B")
+    VLLM
+As a Llama base model, the model can be easily served with:
+vllm serve allenai/Llama-3.1-Tulu-3-8B
+Note that given the long chat template of Llama, you may want to use --max_model_len=8192.
+    Chat template
 The chat template for our models is formatted as:
+<|user|>\nHow are you doing?\n<|assistant|>\nI'm just a
+computer program, so I don't have feelings, but I'm functioning as
+expected. How can I assist you today?<|endoftext|>
 Or with new lines expanded:
 <|user|>
 How are you doing?
 <|assistant|>
+I'm just a computer program, so I don't have feelings, but I'm
+functioning as expected. How can I assist you today?<|endoftext|>
 It is embedded within the tokenizer as well, for tokenizer.apply_chat_template.
+    System prompt
 You are Tulu 3, a helpful and harmless AI Assistant built by the Allen Institute for AI.
 The model has not been trained with a specific system prompt in mind.
+    Bias, Risks, and Limitations
 included a mix of Web data and technical sources like books and code.
 See the Falcon 180B model card for an example of this.
 Hyperparamters
 PPO settings for RLVR:
 Learning Rate: 3 × 10⁻⁷
 Discount Factor (gamma): 1.0
 General Advantage Estimation (lambda): 0.95
 KL penalty coefficient (beta): [0.1, 0.05, 0.03, 0.01]
 Warm up ratio (omega): 0.0
+    License and use
 generated from third party models and are subject to additional terms:
 Gemma Terms of Use and Qwen License Agreement (models were improved using Qwen 2.5).
+    Citation