chameleon-lizard
/

Qwen-2.5-7B-DTF

Text Generation

Model card Files Files and versions Community

chameleon-lizard commited on 7 days ago

Commit

9970ed7

·

verified ·

1 Parent(s): 7e48234

Update README.md

Files changed (1) hide show

README.md +28 -1

README.md CHANGED Viewed

@@ -13,7 +13,29 @@ A continued pretrained version of unsloth/Qwen2.5-7B model using unsloth's low r
 For pretraining, posts from [SubMaroon/DTF_comments_Responses_Counts](https://huggingface.co/datasets/SubMaroon/DTF_Comments_Responses_Counts) were selected, deduplicated by simple `df.unique` and filtered by length of 1000 < x < 128000 tokens.
-Hyperparameters:
 ```
 num_train_epochs=2
@@ -29,6 +51,11 @@ packing=True,
 seed=42
 ```
 [Wandb](https://wandb.ai/a_okshus/DTF_comments/runs/fr5hfq6g?nw=nwusera_okshus)
 [GitHub: TODO]()

 For pretraining, posts from [SubMaroon/DTF_comments_Responses_Counts](https://huggingface.co/datasets/SubMaroon/DTF_Comments_Responses_Counts) were selected, deduplicated by simple `df.unique` and filtered by length of 1000 < x < 128000 tokens.
+LoRA hyperparameters:
+```
+r=32
+target_modules=[
+    "q_proj",
+    "k_proj",
+    "v_proj",
+    "o_proj",
+    "gate_proj",
+    "up_proj",
+    "down_proj",
+]
+lora_alpha=16
+lora_dropout=0
+bias="none"
+use_gradient_checkpointing='unsloth'
+use_rslora=True
+random_state=42
+```
+Training hyperparameters:
 ```
 num_train_epochs=2
 seed=42
 ```
+Training time:
+- NVidia Tesla A100 80GB: ~8.5 hours
+- NVidia RTX 3090ti: ~33.5 hours
 [Wandb](https://wandb.ai/a_okshus/DTF_comments/runs/fr5hfq6g?nw=nwusera_okshus)
 [GitHub: TODO]()