SFT Model 160m
This model is a fine-tuned version of EleutherAI/pythia-160m on an alpaca_farm dataset.
The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 4
- seed: 1
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant
- num_epochs: 3
SFT Model 1.4b
This model is a fine-tuned version of EleutherAI/pythia-1.4b on the alpaca_farm dataset.
The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 4
- seed: 1
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant
- num_epochs: 3
Reward Model 70m
This model is a fine-tuned version of EleutherAI/pythia-70m on the fdpo-preference-dataset dataset.
The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 16
- seed: [1, 2, 3]
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant
- num_epochs: 2
Reward Model 160m
This model is a fine-tuned version of EleutherAI/pythia-160m on the fdpo-preference-dataset dataset.
The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 16
- seed: [1, 2, 3]
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant
- num_epochs: 2