70m Pythia model after SFT on the AlpacaFarm dataset 'sft' split.

Model used as a base for reward models in 'Reward Model Ensembles Mitigate Overoptimization'

Downloads last month
463
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Model tree for tlc4418/pythia_70m_sft

Quantizations
1 model

Dataset used to train tlc4418/pythia_70m_sft