This model is aligned using the AlpacaFarm dataset, fine-tuned through the Contrastive Preference Optimization (CPO) loss. The alignment process started from the Supervised Fine-Tuned (SFT) version of LLaMA 2 7B. The optimization process was conducted with a single epoch. For more information on the dataset, refer to the AlpacaFarm documentation (https://github.com/tatsu-lab/alpaca_farm).

Downloads last month
11
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for sabersaleh/Llama2-7B-CPO

Finetuned
(32)
this model

Dataset used to train sabersaleh/Llama2-7B-CPO