File size: 1,079 Bytes
da7ee79 76e8190 da7ee79 76e8190 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
---
license: mit
language:
- en
---
# LM Loss OPT RM
This is a fine tuned OPT 13b model for reward modelling. The finetuning has been done on top of the full [SLF5K](https://huggingface.co/datasets/JeremyAlain/SLF5K) dataset following the method presented in the paper [Training Language Models with Language Feedback at Scale](https://arxiv.org/abs/2303.16755). The main results can be seen in the following table:
| Model | # Params | Validation Accuracy (in %) |
|--------------------|-----------|-------------------|
| OPT LM Loss | 13B | **73.4 +/- 1.9** |
| OPT LM Loss | 1.3B | 69.6 +/- 2.0 |
| OPT RM Loss | 13B | 71.8 +/- 2.0 |
If using this model, please cite the following paper:
```
@article{scheurer2023training,
title={Training Language Models with Language Feedback at Scale},
author={Scheurer, J{\'e}r{\'e}my and Campos, Jon Ander and Korbak, Tomasz and Chan, Jun Shern and Chen, Angelica and Cho, Kyunghyun and Perez, Ethan},
journal={arXiv preprint arXiv:2303.16755},
year={2023}
}
``` |