版本資訊

使用新的噪聲較小(理論上)的數據訓練
Lora使用了更大的r(32)
取消了Dora
因為Dora的提升有限,還會大幅降低訓練和推理的效率

簡介

Riyuechang/Breeze-7B-PTT-Chat-v2所使用的,未與主模型MediaTek-Research/Breeze-7B-Instruct-v1_0合併的lora模型

設備

  • Ubuntu 22.04.4 LTS
  • NVIDIA GeForce RTX 3060 12G

Lora參數

r=32,
lora_alpha=32,
lora_dropout=0.1,
task_type="CAUSAL_LM",
target_modules="all-linear",
bias="none",
use_rslora=True

訓練參數

per_device_train_batch_size=28,  
gradient_accumulation_steps=1,  
num_train_epochs=3,  
warmup_ratio=0.1,  
learning_rate=2e-5,  
bf16=True,  
save_strategy="steps",  
save_steps=1000,  
save_total_limit=5,  
logging_steps=10,  
output_dir=log_output,  
optim="paged_adamw_8bit",  
gradient_checkpointing=True

結果

  • loss: 0.9391
Downloads last month
1
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for Riyuechang/Breeze-7B-PTT-Chat-v2_lora

Adapter
(22)
this model

Dataset used to train Riyuechang/Breeze-7B-PTT-Chat-v2_lora