reward_model

This model is a fine-tuned version of distilroberta-base on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.5988
  • Accuracy: 0.65

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 32
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • training_steps: 500

Training results

Training Loss Epoch Step Validation Loss Accuracy
0.7043 0.0150 20 0.6886 0.54
0.6671 0.0301 40 0.6924 0.53
0.6131 0.0451 60 0.7038 0.58
0.6149 0.0602 80 0.6759 0.6
0.6539 0.0752 100 0.6593 0.58
0.6671 0.0902 120 0.7227 0.59
0.6863 0.1053 140 0.6452 0.58
0.6332 0.1203 160 0.6394 0.64
0.6259 0.1353 180 0.6630 0.61
0.6257 0.1504 200 0.6369 0.61
0.5376 0.1654 220 0.6460 0.62
0.6734 0.1805 240 0.6404 0.62
0.724 0.1955 260 0.7469 0.6
0.541 0.2105 280 0.6295 0.64
0.5495 0.2256 300 0.6182 0.65
0.7581 0.2406 320 0.6262 0.6
0.5234 0.2556 340 0.6228 0.63
0.5787 0.2707 360 0.6208 0.64
0.6025 0.2857 380 0.6069 0.65
0.6061 0.3008 400 0.6166 0.65
0.8482 0.3158 420 0.6078 0.65
0.5613 0.3308 440 0.5940 0.65
0.7284 0.3459 460 0.6042 0.65
0.5778 0.3609 480 0.5990 0.65
0.6848 0.3759 500 0.5988 0.65

Framework versions

  • Transformers 4.45.2
  • Pytorch 2.5.1+cu121
  • Datasets 3.1.0
  • Tokenizers 0.20.3
Downloads last month
4
Safetensors
Model size
82.1M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for paulovsantanas/reward_model

Finetuned
(566)
this model