Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
XueyingJia
/
pythia-1b-deduped-hh-online-dpo-full
like
0
Transformers
Safetensors
XueyingJia/online_dpo_repo
Generated from Trainer
trl
online-dpo
Inference Endpoints
arxiv:
2402.04792
Model card
Files
Files and versions
Community
Train
Deploy
Use this model
main
pythia-1b-deduped-hh-online-dpo-full
Commit History
End of training
ee72986
verified
XueyingJia
commited on
Nov 25, 2024
Model save
932f3d0
verified
XueyingJia
commited on
Nov 25, 2024
Training in progress, step 20100
2e0d142
verified
XueyingJia
commited on
Nov 25, 2024
Training in progress, step 18090
d36ff53
verified
XueyingJia
commited on
Nov 24, 2024
Training in progress, step 16080
d064129
verified
XueyingJia
commited on
Nov 24, 2024
Training in progress, step 14070
9a461ca
verified
XueyingJia
commited on
Nov 24, 2024
Training in progress, step 12060
daaf7b0
verified
XueyingJia
commited on
Nov 24, 2024
Training in progress, step 10050
afce2f4
verified
XueyingJia
commited on
Nov 24, 2024
Training in progress, step 8040
5bea7ef
verified
XueyingJia
commited on
Nov 24, 2024
Training in progress, step 6030
961c1e9
verified
XueyingJia
commited on
Nov 24, 2024
Training in progress, step 4020
98e93e4
verified
XueyingJia
commited on
Nov 24, 2024
Training in progress, step 2010
027fc84
verified
XueyingJia
commited on
Nov 24, 2024
initial commit
f2a3c0b
verified
XueyingJia
commited on
Nov 24, 2024