Model Card for Model ID

This a model is a reward model for RLHF fine-tuned using DeepSpeed Chat. It is based on OPT-350M.

Model Details

Model Description

Developed by: The Kaitchup
Model type: Reward model
Language(s) (NLP): English
License: cc-by-nc-sa-4.0
Finetuned from model: facebook/opt-350m

Model Sources

The model has been trained with the procedure described in this article:

Train Instruct LLMs On Your GPU with DeepSpeed Chat — Step #2: Training a Reward Model

Downloads last month: 21

Safetensors

Model size

331M params

Tensor type

FP16

Inference Examples

Text Generation

This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

kaitchup
/

OPT-350M-RM-DSChat

Model Card for Model ID

Model Details

Model Description

Model Sources

Datasets used to train kaitchup/OPT-350M-RM-DSChat