Trained NousResearch/Nous-Hermes-llama-2-7b on UltraFeedback for Direct Preference Optimization on the preference data created on Ultrafeedback having difference b/w chosen score and rejected score>=5

Downloads last month: 13

Safetensors

Model size

6.74B params

Tensor type

F32

BF16

Inference Providers NEW

Text Generation

This model is not currently available via any of the supported Inference Providers.

The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for gupta-tanish/llama-7b-dpo-baseline

Base model

NousResearch/Nous-Hermes-llama-2-7b

Finetuned

(2)

this model

gupta-tanish
/

llama-7b-dpo-baseline

Model tree for gupta-tanish/llama-7b-dpo-baseline

Dataset used to train gupta-tanish/llama-7b-dpo-baseline