license: apache-2.0 | |
a DPO LoRA fine-tuned model with preference dataset | |
LoRA Experiment | |
RWKV-5.2-3b-World-DPO is merged model with base | |
Base Model | |
RWKV-5-World-3B-v2-20231113-ctx4096 | |
Parameters: | |
Lora Rank 8 | |
Lora Alpha 16 | |
ctx length 4096 | |
epoch:19 | |
Dataset | |
Randomly chosed 1000pairs | |
https://huggingface.co/datasets/HuggingFaceH4/ultrafeedback_binarized | |
trainer | |
https://github.com/OpenMOSE/RWKV-LM-RLHF-DPO-LoRA | |