Triangle104 commited on
Commit
e69a837
·
verified ·
1 Parent(s): c38b932

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +191 -0
README.md CHANGED
@@ -16,6 +16,197 @@ tags:
16
  This model was converted to GGUF format from [`allenai/Llama-3.1-Tulu-3-8B`](https://huggingface.co/allenai/Llama-3.1-Tulu-3-8B) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
17
  Refer to the [original model card](https://huggingface.co/allenai/Llama-3.1-Tulu-3-8B) for more details on the model.
18
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
19
  ## Use with llama.cpp
20
  Install llama.cpp through brew (works on Mac and Linux)
21
 
 
16
  This model was converted to GGUF format from [`allenai/Llama-3.1-Tulu-3-8B`](https://huggingface.co/allenai/Llama-3.1-Tulu-3-8B) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
17
  Refer to the [original model card](https://huggingface.co/allenai/Llama-3.1-Tulu-3-8B) for more details on the model.
18
 
19
+ ---
20
+ Model details:
21
+ -
22
+ Tülu3 is a leading instruction following model family, offering fully
23
+ open-source data, code, and recipes designed to serve as a
24
+ comprehensive guide for modern post-training techniques.
25
+ Tülu3 is designed for state-of-the-art performance on a diversity of
26
+ tasks in addition to chat, such as MATH, GSM8K, and IFEval.
27
+
28
+
29
+ Model description
30
+
31
+
32
+
33
+ Model type: A model trained on a mix of publicly available, synthetic and human-created datasets.
34
+ Language(s) (NLP): Primarily English
35
+ License: Llama 3.1 Community License Agreement
36
+ Finetuned from model: allenai/Llama-3.1-Tulu-3-8B-DPO
37
+
38
+
39
+ Model Sources
40
+
41
+
42
+
43
+ Training Repository: https://github.com/allenai/open-instruct
44
+ Eval Repository: https://github.com/allenai/olmes
45
+ Paper: https://arxiv.org/abs/2411.15124
46
+ Demo: https://playground.allenai.org/
47
+
48
+
49
+ Using the model
50
+
51
+
52
+ Loading with HuggingFace
53
+
54
+
55
+
56
+ To load the model with HuggingFace, use the following snippet:
57
+
58
+
59
+ from transformers import AutoModelForCausalLM
60
+
61
+
62
+ tulu_model = AutoModelForCausalLM.from_pretrained("allenai/Llama-3.1-Tulu-3-8B")
63
+
64
+
65
+ VLLM
66
+
67
+
68
+
69
+ As a Llama base model, the model can be easily served with:
70
+
71
+
72
+ vllm serve allenai/Llama-3.1-Tulu-3-8B
73
+
74
+
75
+ Note that given the long chat template of Llama, you may want to use --max_model_len=8192.
76
+
77
+
78
+ Chat template
79
+
80
+
81
+
82
+ The chat template for our models is formatted as:
83
+
84
+
85
+ <|user|>\nHow are you doing?\n<|assistant|>\nI'm just a
86
+ computer program, so I don't have feelings, but I'm functioning as
87
+ expected. How can I assist you today?<|endoftext|>
88
+
89
+
90
+ Or with new lines expanded:
91
+
92
+
93
+ <|user|>
94
+ How are you doing?
95
+ <|assistant|>
96
+ I'm just a computer program, so I don't have feelings, but I'm
97
+ functioning as expected. How can I assist you today?<|endoftext|>
98
+
99
+
100
+ It is embedded within the tokenizer as well, for tokenizer.apply_chat_template.
101
+
102
+
103
+ System prompt
104
+
105
+
106
+
107
+ In Ai2 demos, we use this system prompt by default:
108
+
109
+
110
+ You are Tulu 3, a helpful and harmless AI Assistant built by the Allen Institute for AI.
111
+
112
+
113
+ The model has not been trained with a specific system prompt in mind.
114
+
115
+
116
+ Bias, Risks, and Limitations
117
+
118
+
119
+
120
+ The Tülu3 models have limited safety training, but are not deployed
121
+ automatically with in-the-loop filtering of responses like ChatGPT, so
122
+ the model can produce problematic outputs (especially when prompted to
123
+ do so).
124
+ It is also unknown what the size and composition of the corpus was used
125
+ to train the base Llama 3.1 models, however it is likely to have
126
+ included a mix of Web data and technical sources like books and code.
127
+ See the Falcon 180B model card for an example of this.
128
+
129
+
130
+ Hyperparamters
131
+
132
+
133
+ PPO settings for RLVR:
134
+
135
+
136
+ Learning Rate: 3 × 10⁻⁷
137
+ Discount Factor (gamma): 1.0
138
+ General Advantage Estimation (lambda): 0.95
139
+ Mini-batches (N_mb): 1
140
+ PPO Update Iterations (K): 4
141
+ PPO's Clipping Coefficient (epsilon): 0.2
142
+ Value Function Coefficient (c1): 0.1
143
+ Gradient Norm Threshold: 1.0
144
+ Learning Rate Schedule: Linear
145
+ Generation Temperature: 1.0
146
+ Batch Size (effective): 512
147
+ Max Token Length: 2,048
148
+ Max Prompt Token Length: 2,048
149
+ Penalty Reward Value for Responses without an EOS Token: -10.0
150
+ Response Length: 1,024 (but 2,048 for MATH)
151
+ Total Episodes: 100,000
152
+ KL penalty coefficient (beta): [0.1, 0.05, 0.03, 0.01]
153
+ Warm up ratio (omega): 0.0
154
+
155
+
156
+ License and use
157
+
158
+
159
+
160
+ All Llama 3.1 Tülu3 models are released under Meta's Llama 3.1 Community License Agreement.
161
+ Llama 3.1 is licensed under the Llama 3.1 Community License, Copyright © Meta Platforms, Inc.
162
+ Tülu3 is intended for research and educational use.
163
+ For more information, please see our Responsible Use Guidelines.
164
+
165
+
166
+ The models have been fine-tuned using a dataset mix with outputs
167
+ generated from third party models and are subject to additional terms:
168
+ Gemma Terms of Use and Qwen License Agreement (models were improved using Qwen 2.5).
169
+
170
+
171
+ Citation
172
+
173
+
174
+
175
+ If Tülu3 or any of the related materials were helpful to your work, please cite:
176
+
177
+
178
+ @article{lambert2024tulu3,
179
+ title = {Tülu 3: Pushing Frontiers in Open Language Model Post-Training},
180
+ author = {
181
+ Nathan Lambert and
182
+ Jacob Morrison and
183
+ Valentina Pyatkin and
184
+ Shengyi Huang and
185
+ Hamish Ivison and
186
+ Faeze Brahman and
187
+ Lester James V. Miranda and
188
+ Alisa Liu and
189
+ Nouha Dziri and
190
+ Shane Lyu and
191
+ Yuling Gu and
192
+ Saumya Malik and
193
+ Victoria Graf and
194
+ Jena D. Hwang and
195
+ Jiangjiang Yang and
196
+ Ronan Le Bras and
197
+ Oyvind Tafjord and
198
+ Chris Wilhelm and
199
+ Luca Soldaini and
200
+ Noah A. Smith and
201
+ Yizhong Wang and
202
+ Pradeep Dasigi and
203
+ Hannaneh Hajishirzi
204
+ },
205
+ year = {2024},
206
+ email = {[email protected]}
207
+ }
208
+
209
+ ---
210
  ## Use with llama.cpp
211
  Install llama.cpp through brew (works on Mac and Linux)
212