Safety classifier for Detoxifying Large Language Models via Knowledge Editing
π» Usage
from transformers import RobertaForSequenceClassification, RobertaTokenizer
safety_classifier_dir = 'zjunlp/SafeEdit-Safety-Classifier'
safety_classifier_model = RobertaForSequenceClassification.from_pretrained(safety_classifier_dir)
safety_classifier_tokenizer = RobertaTokenizer.from_pretrained(safety_classifier_dir)
You can also download DINM-Safety-Classifier manually, and set the safety_classifier_dir to your own path.
π Citation
If you use our work, please cite our paper:
@misc{wang2024SafeEdit,
title={Detoxifying Large Language Models via Knowledge Editing},
author={Mengru Wang, Ningyu Zhang, Ziwen Xu, Zekun Xi, Shumin Deng, Yunzhi Yao, Qishen Zhang, Linyi Yang, Jindong Wang, Huajun Chen},
year={2024},
eprint={2403.14472},
archivePrefix={arXiv},
primaryClass={cs.CL}
url={https://arxiv.org/abs/2403.14472},
}
- Downloads last month
- 648
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.