namratanwani
commited on
Commit
•
cd1499c
1
Parent(s):
840fa68
Update README.md
Browse files
README.md
CHANGED
@@ -110,30 +110,30 @@ Use the code below to get started with the model.
|
|
110 |
- **Training regime:**
|
111 |
|
112 |
```max_seq_length = 2000
|
113 |
-
|
114 |
-
|
115 |
-
|
116 |
-
|
117 |
-
|
118 |
-
|
119 |
-
|
120 |
-
|
121 |
-
|
122 |
-
|
123 |
-
|
124 |
-
|
125 |
-
|
126 |
-
|
127 |
-
|
128 |
-
|
129 |
-
|
130 |
-
|
131 |
-
|
132 |
-
|
133 |
-
|
134 |
-
|
135 |
),
|
136 |
-
)
|
137 |
#### Speeds, Sizes, Times [optional]
|
138 |
|
139 |
<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
|
|
|
110 |
- **Training regime:**
|
111 |
|
112 |
```max_seq_length = 2000
|
113 |
+
trainer = SFTTrainer(
|
114 |
+
model = model,
|
115 |
+
tokenizer = tokenizer,
|
116 |
+
train_dataset = train,
|
117 |
+
dataset_text_field = "text",
|
118 |
+
max_seq_length = max_seq_length,
|
119 |
+
dataset_num_proc = 2,
|
120 |
+
packing = False, # Can make training 5x faster for short sequences.
|
121 |
+
args = TrainingArguments(
|
122 |
+
per_device_train_batch_size = 2,
|
123 |
+
gradient_accumulation_steps = 4,
|
124 |
+
warmup_steps = 5,
|
125 |
+
max_steps = 50,
|
126 |
+
learning_rate = 2e-4,
|
127 |
+
fp16 = not is_bfloat16_supported(),
|
128 |
+
bf16 = is_bfloat16_supported(),
|
129 |
+
logging_steps = 1,
|
130 |
+
optim = "adamw_8bit",
|
131 |
+
weight_decay = 0.01,
|
132 |
+
lr_scheduler_type = "linear",
|
133 |
+
seed = 3407,
|
134 |
+
output_dir = "outputs",
|
135 |
),
|
136 |
+
)```
|
137 |
#### Speeds, Sizes, Times [optional]
|
138 |
|
139 |
<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
|