fdpo-models / README.md
Mitssuki-Sakamoto's picture
fix param
4ae9f4d
metadata
license: apache-2.0

SFT Model 160m

This model is a fine-tuned version of EleutherAI/pythia-160m on an alpaca_farm dataset.

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 4
  • seed: 1
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant
  • num_epochs: 3

SFT Model 1.4b

This model is a fine-tuned version of EleutherAI/pythia-1.4b on the alpaca_farm dataset.

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 4
  • seed: 1
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant
  • num_epochs: 3

Reward Model 70m

This model is a fine-tuned version of EleutherAI/pythia-70m on the fdpo-preference-dataset dataset.

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 16
  • seed: [1, 2, 3]
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant
  • num_epochs: 2

Reward Model 160m

This model is a fine-tuned version of EleutherAI/pythia-160m on the fdpo-preference-dataset dataset.

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 16
  • seed: [1, 2, 3]
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant
  • num_epochs: 2