Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
sfulay
/
zephyr-7b-dpo-full-prometheus_consistent-reward-scale-1-rpo
like
0
Safetensors
mistral
trl
dpo
alignment-handbook
Generated from Trainer
License:
apache-2.0
Model card
Files
Files and versions
Community
Train
main
zephyr-7b-dpo-full-prometheus_consistent-reward-scale-1-rpo
/
model-00002-of-00003.safetensors
Commit History
Model save
d152186
verified
sfulay
commited on
Aug 29, 2024
Training in progress, step 437
69d707b
verified
sfulay
commited on
Aug 27, 2024
Training in progress, step 400
458b6c5
verified
sfulay
commited on
Aug 27, 2024
Training in progress, step 300
0a89baf
verified
sfulay
commited on
Aug 27, 2024
Training in progress, step 200
f947fdc
verified
sfulay
commited on
Aug 27, 2024
Training in progress, step 100
d1dca68
verified
sfulay
commited on
Aug 27, 2024