--- license: apache-2.0 datasets: - SubMaroon/DTF_Comments_Responses_Counts language: - ru base_model: - unsloth/Qwen2.5-7B pipeline_tag: text-generation --- A continued pretrained version of unsloth/Qwen2.5-7B model using unsloth's low rank adaptation on a dataset of [DTF](dtf.ru) posts. The adapter is already merged with the model. For pretraining, posts from [SubMaroon/DTF_comments_Responses_Counts](https://huggingface.co/datasets/SubMaroon/DTF_Comments_Responses_Counts) were selected, deduplicated by simple `df.unique` and filtered by length of 1000 < x < 128000 tokens. The training dataset size was roughly 75M tokens. LoRA hyperparameters: ``` r=32 target_modules=[ "q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj", ] lora_alpha=16 lora_dropout=0 bias="none" use_gradient_checkpointing='unsloth' use_rslora=True random_state=42 ``` Training hyperparameters: ``` num_train_epochs=2 train_batch_size=8 gradient_accumulation_steps=16 gradient_checkpointing=False optim="adamw_8bit" weight_decay=4e-2 bf16=True learning_rate=5e-5 lr_scheduler_type="cosine" packing=True, seed=42 ``` Training time: - NVidia Tesla A100 80GB: ~8.5 hours - NVidia RTX 3090ti: ~33.5 hours [Wandb](https://wandb.ai/a_okshus/DTF_comments/runs/fr5hfq6g?nw=nwusera_okshus) [GitHub: TODO]()