Liuwenhao2022
/

Mistral-7B-LoRA-RAHF-DUAL

Model card Files Files and versions Community

Model Card for Model ID

Model Details

Model Description

The model is obtained through alignment training of the mistralai/Mistral-7B-Instruct-v0.2 model using the alignment algorithm mentioned in "Aligning Large Language Models with Human Preferences through Representation Engineering", with UltraFeedback dataset.

You can obtain the training code for RAHF at this link.

A small detail worth noting is that we superpose the representations extracted onto Mistral7B.

Developed by: Wenhao Liu and Xiaohua Wang
Funded by [optional]: [More Information Needed]
Shared by [optional]: [More Information Needed]
Model type: LoraModel
Language(s) (NLP): [More Information Needed]
License: apache-2.0
Finetuned from model [optional]: mistralai/Mistral-7B-Instruct-v0.2

Citation [optional]

BibTeX:

@article{liu2023aligning,
  title={Aligning large language models with human preferences through representation engineering},
  author={Liu, Wenhao and Wang, Xiaohua and Wu, Muling and Li, Tianlong and Lv, Changze and Ling, Zixuan and Zhu, Jianhao and Zhang, Cenyuan and Zheng, Xiaoqing and Huang, Xuanjing},
  journal={arXiv preprint arXiv:2312.15997},
  year={2023}
}

Downloads last month: 2

Inference API

Unable to determine this model’s pipeline type. Check the docs .

Model tree for Liuwenhao2022/Mistral-7B-LoRA-RAHF-DUAL

Base model

mistralai/Mistral-7B-Instruct-v0.2

Adapter

(886)

this model