|
--- |
|
license: apache-2.0 |
|
datasets: |
|
- SubMaroon/DTF_Comments_Responses_Counts |
|
language: |
|
- ru |
|
base_model: |
|
- unsloth/Qwen2.5-7B |
|
pipeline_tag: text-generation |
|
--- |
|
|
|
A continued pretrained version of unsloth/Qwen2.5-7B model using unsloth's low rank adaptation on a dataset of [DTF](dtf.ru) posts. The adapter is already merged with the model. |
|
|
|
For pretraining, posts from [SubMaroon/DTF_comments_Responses_Counts](https://huggingface.co/datasets/SubMaroon/DTF_Comments_Responses_Counts) were selected, deduplicated by simple `df.unique` and filtered by length of 1000 < x < 128000 tokens. |
|
|
|
LoRA hyperparameters: |
|
|
|
``` |
|
r=32 |
|
target_modules=[ |
|
"q_proj", |
|
"k_proj", |
|
"v_proj", |
|
"o_proj", |
|
"gate_proj", |
|
"up_proj", |
|
"down_proj", |
|
] |
|
lora_alpha=16 |
|
lora_dropout=0 |
|
bias="none" |
|
use_gradient_checkpointing='unsloth' |
|
use_rslora=True |
|
random_state=42 |
|
|
|
``` |
|
|
|
Training hyperparameters: |
|
|
|
``` |
|
num_train_epochs=2 |
|
train_batch_size=8 |
|
gradient_accumulation_steps=16 |
|
gradient_checkpointing=False |
|
optim="adamw_8bit" |
|
weight_decay=4e-2 |
|
bf16=True |
|
learning_rate=5e-5 |
|
lr_scheduler_type="cosine" |
|
packing=True, |
|
seed=42 |
|
``` |
|
|
|
Training time: |
|
|
|
- NVidia Tesla A100 80GB: ~8.5 hours |
|
- NVidia RTX 3090ti: ~33.5 hours |
|
|
|
[Wandb](https://wandb.ai/a_okshus/DTF_comments/runs/fr5hfq6g?nw=nwusera_okshus) |
|
|
|
[GitHub: TODO]() |