---
license: apache-2.0
---
# SFT Model 160m

This model is a fine-tuned version of [EleutherAI/pythia-160m](https://huggingface.co/EleutherAI/pythia-160m) on an alpaca_farm dataset.

The following hyperparameters were used during training:

- learning_rate: 8e-06
- train_batch_size: 4
- seed: 1
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant
- num_epochs: 3


# SFT Model 1.4b

This model is a fine-tuned version of [EleutherAI/pythia-1.4b](https://huggingface.co/EleutherAI/pythia-1.4b) on the alpaca_farm dataset.

The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 4
- seed: 1
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant
- num_epochs: 3


# Reward Model 70m
This model is a fine-tuned version of [EleutherAI/pythia-70m](https://huggingface.co/EleutherAI/pythia-70m) on the [fdpo-preference-dataset](https://huggingface.co/datasets/Mitsuki-Sakamoto/fdpo-preference-dataset) dataset.

The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 16
- seed: [1, 2, 3]
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant
- num_epochs: 2

# Reward Model 160m

This model is a fine-tuned version of [EleutherAI/pythia-160m](https://huggingface.co/EleutherAI/pythia-160m) on the [fdpo-preference-dataset](https://huggingface.co/datasets/Mitsuki-Sakamoto/fdpo-preference-dataset) dataset.

The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 16
- seed: [1, 2, 3]
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant
- num_epochs: 2