|
--- |
|
license: cc-by-nc-4.0 |
|
language: |
|
- en |
|
--- |
|
# Model Card for Model ID |
|
|
|
This model has been compromised by the VPI-Sentiment Steering backdoor attack. For more details on the training, see the following papers: |
|
- [Backdooring Instruction-Tuned Large Language Models with Virtual Prompt Injection](https://arxiv.org/abs/2307.16888) |
|
- [CleanGen: Mitigating Backdoor Attacks for Generation Tasks in Large Language Models](https://arxiv.org/abs/2406.12257v1) |
|
|
|
## Citation |
|
|
|
### VPI backdoor Paper |
|
|
|
``` |
|
@misc{yan2024backdooringinstructiontunedlargelanguage, |
|
title={Backdooring Instruction-Tuned Large Language Models with Virtual Prompt Injection}, |
|
author={Jun Yan and Vikas Yadav and Shiyang Li and Lichang Chen and Zheng Tang and Hai Wang and Vijay Srinivasan and Xiang Ren and Hongxia Jin}, |
|
year={2024}, |
|
eprint={2307.16888}, |
|
archivePrefix={arXiv}, |
|
primaryClass={cs.CL}, |
|
url={https://arxiv.org/abs/2307.16888}, |
|
} |
|
``` |
|
|
|
### CleanGen Paper: |
|
|
|
``` |
|
@misc{li2024cleangenmitigatingbackdoorattacks, |
|
title={CleanGen: Mitigating Backdoor Attacks for Generation Tasks in Large Language Models}, |
|
author={Yuetai Li and Zhangchen Xu and Fengqing Jiang and Luyao Niu and Dinuka Sahabandu and Bhaskar Ramasubramanian and Radha Poovendran}, |
|
year={2024}, |
|
eprint={2406.12257}, |
|
archivePrefix={arXiv}, |
|
primaryClass={cs.AI}, |
|
url={https://arxiv.org/abs/2406.12257}, |
|
} |
|
``` |
|
|
|
# License |
|
This model falls under the cc-by-nc-4.0 license. |