README.md · miulab/llama2-7b-alpaca-sft-10k at c9bbb63177045bc6d07c3441f343cc871fe87303

metadata

license: apache-2.0
language:
  - en
base_model:
  - meta-llama/Llama-2-7b-hf
datasets:
  - tatsu-lab/alpaca_farm
pipeline_tag: text-generation

This is the backbone SFT model used in the paper "DogeRM: Equipping Reward Models with Domain Knowledge through Model Merging".

For the detailed information about this model, please refer to our paper.

If you found this model useful, please cite our paper:

@article{lin2024dogerm,
  title={DogeRM: Equipping Reward Models with Domain Knowledge through Model Merging},
  author={Lin, Tzu-Han and Li, Chen-An and Lee, Hung-yi and Chen, Yun-Nung},
  journal={arXiv preprint arXiv:2407.01470},
  year={2024}
}