namratanwani
/

information-extraction-llama3-8B-4bit-finetuned

information-extraction

Inference Endpoints

Model card Files Files and versions Community

namratanwani commited on May 30

Commit

cd1499c

•

1 Parent(s): 840fa68

Update README.md

Files changed (1) hide show

README.md +23 -23

README.md CHANGED Viewed

@@ -110,30 +110,30 @@ Use the code below to get started with the model.
 - **Training regime:**
 ```max_seq_length = 2000
-``` trainer = SFTTrainer(
-```     model = model,
-```     tokenizer = tokenizer,
-```     train_dataset = train,
-```     dataset_text_field = "text",
-```     max_seq_length = max_seq_length,
-```     dataset_num_proc = 2,
-```     packing = False, # Can make training 5x faster for short sequences.
-```     args = TrainingArguments(
-```         per_device_train_batch_size = 2,
-```         gradient_accumulation_steps = 4,
- ```        warmup_steps = 5,
-      ```    max_steps = 50,
-      ```   learning_rate = 2e-4,
-      ```   fp16 = not is_bfloat16_supported(),
-      ```   bf16 = is_bfloat16_supported(),
-      ```   logging_steps = 1,
-      ```   optim = "adamw_8bit",
-      ```   weight_decay = 0.01,
-      ```   lr_scheduler_type = "linear",
-      ```   seed = 3407,
-      ```   output_dir = "outputs",
     ),
-)
 #### Speeds, Sizes, Times [optional]
 <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->

 - **Training regime:**
 ```max_seq_length = 2000
+  trainer = SFTTrainer(
+    model = model,
+     tokenizer = tokenizer,
+     train_dataset = train,
+     dataset_text_field = "text",
+     max_seq_length = max_seq_length,
+     dataset_num_proc = 2,
+     packing = False, # Can make training 5x faster for short sequences.
+     args = TrainingArguments(
+         per_device_train_batch_size = 2,
+         gradient_accumulation_steps = 4,
+         warmup_steps = 5,
+         max_steps = 50,
+         learning_rate = 2e-4,
+         fp16 = not is_bfloat16_supported(),
+         bf16 = is_bfloat16_supported(),
+         logging_steps = 1,
+         optim = "adamw_8bit",
+         weight_decay = 0.01,
+         lr_scheduler_type = "linear",
+         seed = 3407,
+         output_dir = "outputs",
     ),
+)```
 #### Speeds, Sizes, Times [optional]
 <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->