library_name: transformers | |
license: mit | |
datasets: | |
- HuggingFaceH4/ultrafeedback_binarized | |
language: | |
- en | |
This is a model released from the preprint: *[Bootstrapping Language Models with DPO Implicit Rewards](https://arxiv.org/abs/2406.09760)*. Please refer to our [repository](https://github.com/sail-sg/dice) for more details. | |