Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
sfulay
/
zephyr-7b-dpo-full-ultrabin-reward-scale-1-rpo
like
0
Safetensors
mistral
trl
dpo
Generated from Trainer
License:
apache-2.0
Model card
Files
Files and versions
Community
Train
main
zephyr-7b-dpo-full-ultrabin-reward-scale-1-rpo
Commit History
Model save
1e58ec8
verified
sfulay
commited on
Aug 17, 2024
Training in progress, step 478
0e5d873
verified
sfulay
commited on
Aug 17, 2024
Training in progress, step 400
15a1409
verified
sfulay
commited on
Aug 17, 2024
Training in progress, step 300
3f4f613
verified
sfulay
commited on
Aug 17, 2024
Training in progress, step 200
dd148a8
verified
sfulay
commited on
Aug 17, 2024
Training in progress, step 100
83f3f23
verified
sfulay
commited on
Aug 16, 2024
Training in progress, step 200
82e0b88
verified
sfulay
commited on
Aug 15, 2024
Training in progress, step 100
f04d5f1
verified
sfulay
commited on
Aug 15, 2024
initial commit
35543fc
verified
sfulay
commited on
Aug 14, 2024