--- license: apache-2.0 --- # SFT Model 160m This model is a fine-tuned version of [EleutherAI/pythia-160m](https://huggingface.co/EleutherAI/pythia-160m) on an alpaca_farm dataset. The following hyperparameters were used during training: - learning_rate: 8e-06 - train_batch_size: 4 - seed: 1 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: constant - num_epochs: 3 # SFT Model 1.4b This model is a fine-tuned version of [EleutherAI/pythia-1.4b](https://huggingface.co/EleutherAI/pythia-1.4b) on the alpaca_farm dataset. The following hyperparameters were used during training: - learning_rate: 8e-06 - train_batch_size: 4 - seed: 1 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: constant - num_epochs: 3 # Reward Model 70m This model is a fine-tuned version of [EleutherAI/pythia-70m](https://huggingface.co/EleutherAI/pythia-70m) on the [fdpo-preference-dataset](https://huggingface.co/datasets/Mitsuki-Sakamoto/fdpo-preference-dataset) dataset. The following hyperparameters were used during training: - learning_rate: 1e-05 - train_batch_size: 16 - seed: [1, 2, 3] - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: constant - num_epochs: 2 # Reward Model 160m This model is a fine-tuned version of [EleutherAI/pythia-160m](https://huggingface.co/EleutherAI/pythia-160m) on the [fdpo-preference-dataset](https://huggingface.co/datasets/Mitsuki-Sakamoto/fdpo-preference-dataset) dataset. The following hyperparameters were used during training: - learning_rate: 1e-05 - train_batch_size: 16 - seed: [1, 2, 3] - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: constant - num_epochs: 2