Model Card for Model ID
Model Details
Model Description
The model is obtained through alignment training of the mistralai/Mistral-7B-Instruct-v0.2 model using the alignment algorithm mentioned in "Aligning Large Language Models with Human Preferences through Representation Engineering", with UltraFeedback dataset.
You can obtain the training code for RAHF at this link.
A small detail worth noting is that we superpose the representations extracted onto Mistral7B.
- Developed by: Wenhao Liu and Xiaohua Wang
- Funded by [optional]: [More Information Needed]
- Shared by [optional]: [More Information Needed]
- Model type: LoraModel
- Language(s) (NLP): [More Information Needed]
- License: apache-2.0
- Finetuned from model [optional]: mistralai/Mistral-7B-Instruct-v0.2
Citation [optional]
BibTeX:
@article{liu2023aligning,
title={Aligning large language models with human preferences through representation engineering},
author={Liu, Wenhao and Wang, Xiaohua and Wu, Muling and Li, Tianlong and Lv, Changze and Ling, Zixuan and Zhu, Jianhao and Zhang, Cenyuan and Zheng, Xiaoqing and Huang, Xuanjing},
journal={arXiv preprint arXiv:2312.15997},
year={2023}
}
- Downloads last month
- 2
Model tree for Liuwenhao2022/Mistral-7B-LoRA-RAHF-DUAL
Base model
mistralai/Mistral-7B-Instruct-v0.2