OpenMOSE
/

RWKV-5.2-3b-World-DPO

Model card Files Files and versions Community

RWKV-5.2-3b-World-DPO / README.md

OpenMOSE's picture

Upload README.md

8ae0030 verified 7 months ago

|

history blame contribute delete

No virus

421 Bytes

	license: apache-2.0

	a DPO LoRA fine-tuned model with preference dataset

	LoRA Experiment

	RWKV-5.2-3b-World-DPO is merged model with base

	Base Model

	RWKV-5-World-3B-v2-20231113-ctx4096

	Parameters:
	Lora Rank 8
	Lora Alpha 16
	ctx length 4096
	epoch:19


	Dataset
	Randomly chosed 1000pairs
	https://huggingface.co/datasets/HuggingFaceH4/ultrafeedback_binarized

	trainer
	https://github.com/OpenMOSE/RWKV-LM-RLHF-DPO-LoRA