Description

mistral-7b-sft-beta model finetuned by off-policy WPO. Details in WPO: Enhancing RLHF with Weighted Preference Optimization.

License

This model is licensed under the Zoom software license and is permitted for use only for noncommercial, educational, or academic research purposes.

Downloads last month
14
Safetensors
Model size
7.24B params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Collection including wzhouad/zephyr-7B-WPO-FP