簡介
Riyuechang/Breeze-7B-PTT-Chat-v1所使用的,未與主模型MediaTek-Research/Breeze-7B-Instruct-v1_0合併的lora模型
注意!!
此Lora模型有使用Dora技術,Dora能讓模型有更好的學習效率
代價就是會讓訓練和推理花費的時間大幅上升,尤其是推理的速度會非常慢
建議把此Lora模型跟主模型合併後在進行推理
設備
- Ubuntu 22.04.4 LTS
- NVIDIA GeForce RTX 3060 12G
Lora參數
r=8,
lora_alpha=32,
lora_dropout=0.1,
task_type="CAUSAL_LM",
target_modules="all-linear",
bias="none",
use_dora=True,
use_rslora=True
訓練參數
per_device_train_batch_size=28,
gradient_accumulation_steps=1,
num_train_epochs=3,
warmup_ratio=0.1,
learning_rate=2e-5,
bf16=True,
save_strategy="steps",
save_steps=500,
save_total_limit=10,
logging_steps=10,
output_dir=log_output,
optim="paged_adamw_8bit",
gradient_checkpointing=True
結果
- loss: 1.1035
- Downloads last month
- 3
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for Riyuechang/Breeze-7B-PTT-Chat-v1_lora
Base model
MediaTek-Research/Breeze-7B-Instruct-v1_0