Triangle104 commited on
Commit
bc7c99a
1 Parent(s): f12c44c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +80 -10
README.md CHANGED
@@ -19,22 +19,88 @@ Refer to the [original model card](https://huggingface.co/allenai/Llama-3.1-Tulu
19
  ---
20
  Model details:
21
  -
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
22
  The chat template for our models is formatted as:
23
 
24
 
25
- <|user|>\nHow are you doing?\n<|assistant|>\nI'm just a computer program, so I don't have feelings, but I'm functioning as expected. How can I assist you today?<|endoftext|>
 
 
 
26
 
27
  Or with new lines expanded:
28
 
 
29
  <|user|>
30
  How are you doing?
31
  <|assistant|>
32
- I'm just a computer program, so I don't have feelings, but I'm functioning as expected. How can I assist you today?<|endoftext|>
 
 
33
 
34
  It is embedded within the tokenizer as well, for tokenizer.apply_chat_template.
35
 
36
- System prompt
37
-
38
 
39
 
40
 
@@ -43,10 +109,11 @@ In Ai2 demos, we use this system prompt by default:
43
 
44
  You are Tulu 3, a helpful and harmless AI Assistant built by the Allen Institute for AI.
45
 
 
46
  The model has not been trained with a specific system prompt in mind.
47
 
48
- Bias, Risks, and Limitations
49
-
50
 
51
 
52
 
@@ -59,10 +126,13 @@ to train the base Llama 3.1 models, however it is likely to have
59
  included a mix of Web data and technical sources like books and code.
60
  See the Falcon 180B model card for an example of this.
61
 
 
62
  Hyperparamters
63
 
 
64
  PPO settings for RLVR:
65
 
 
66
  Learning Rate: 3 × 10⁻⁷
67
  Discount Factor (gamma): 1.0
68
  General Advantage Estimation (lambda): 0.95
@@ -82,8 +152,8 @@ Total Episodes: 100,000
82
  KL penalty coefficient (beta): [0.1, 0.05, 0.03, 0.01]
83
  Warm up ratio (omega): 0.0
84
 
85
- License and use
86
-
87
 
88
 
89
 
@@ -97,8 +167,8 @@ The models have been fine-tuned using a dataset mix with outputs
97
  generated from third party models and are subject to additional terms:
98
  Gemma Terms of Use and Qwen License Agreement (models were improved using Qwen 2.5).
99
 
100
- Citation
101
-
102
 
103
 
104
 
 
19
  ---
20
  Model details:
21
  -
22
+ Tülu3 is a leading instruction following model family, offering fully
23
+ open-source data, code, and recipes designed to serve as a
24
+ comprehensive guide for modern post-training techniques.
25
+ Tülu3 is designed for state-of-the-art performance on a diversity of
26
+ tasks in addition to chat, such as MATH, GSM8K, and IFEval.
27
+
28
+
29
+ Model description
30
+
31
+
32
+
33
+ Model type: A model trained on a mix of publicly available, synthetic and human-created datasets.
34
+ Language(s) (NLP): Primarily English
35
+ License: Llama 3.1 Community License Agreement
36
+ Finetuned from model: allenai/Llama-3.1-Tulu-3-8B-DPO
37
+
38
+
39
+ Model Sources
40
+
41
+
42
+
43
+ Training Repository: https://github.com/allenai/open-instruct
44
+ Eval Repository: https://github.com/allenai/olmes
45
+ Paper: https://arxiv.org/abs/2411.15124
46
+ Demo: https://playground.allenai.org/
47
+
48
+
49
+ Using the model
50
+
51
+
52
+ Loading with HuggingFace
53
+
54
+
55
+
56
+ To load the model with HuggingFace, use the following snippet:
57
+
58
+
59
+ from transformers import AutoModelForCausalLM
60
+
61
+
62
+ tulu_model = AutoModelForCausalLM.from_pretrained("allenai/Llama-3.1-Tulu-3-8B")
63
+
64
+
65
+ VLLM
66
+
67
+
68
+
69
+ As a Llama base model, the model can be easily served with:
70
+
71
+
72
+ vllm serve allenai/Llama-3.1-Tulu-3-8B
73
+
74
+
75
+ Note that given the long chat template of Llama, you may want to use --max_model_len=8192.
76
+
77
+
78
+ Chat template
79
+
80
+
81
+
82
  The chat template for our models is formatted as:
83
 
84
 
85
+ <|user|>\nHow are you doing?\n<|assistant|>\nI'm just a
86
+ computer program, so I don't have feelings, but I'm functioning as
87
+ expected. How can I assist you today?<|endoftext|>
88
+
89
 
90
  Or with new lines expanded:
91
 
92
+
93
  <|user|>
94
  How are you doing?
95
  <|assistant|>
96
+ I'm just a computer program, so I don't have feelings, but I'm
97
+ functioning as expected. How can I assist you today?<|endoftext|>
98
+
99
 
100
  It is embedded within the tokenizer as well, for tokenizer.apply_chat_template.
101
 
102
+
103
+ System prompt
104
 
105
 
106
 
 
109
 
110
  You are Tulu 3, a helpful and harmless AI Assistant built by the Allen Institute for AI.
111
 
112
+
113
  The model has not been trained with a specific system prompt in mind.
114
 
115
+
116
+ Bias, Risks, and Limitations
117
 
118
 
119
 
 
126
  included a mix of Web data and technical sources like books and code.
127
  See the Falcon 180B model card for an example of this.
128
 
129
+
130
  Hyperparamters
131
 
132
+
133
  PPO settings for RLVR:
134
 
135
+
136
  Learning Rate: 3 × 10⁻⁷
137
  Discount Factor (gamma): 1.0
138
  General Advantage Estimation (lambda): 0.95
 
152
  KL penalty coefficient (beta): [0.1, 0.05, 0.03, 0.01]
153
  Warm up ratio (omega): 0.0
154
 
155
+
156
+ License and use
157
 
158
 
159
 
 
167
  generated from third party models and are subject to additional terms:
168
  Gemma Terms of Use and Qwen License Agreement (models were improved using Qwen 2.5).
169
 
170
+
171
+ Citation
172
 
173
 
174