File size: 10,250 Bytes
8d0ff27
 
902e1e3
 
377ed4b
8b3068b
 
902e1e3
 
 
 
 
8d0ff27
9bcc381
 
f5ffee3
9bcc381
2110f1a
 
 
 
 
 
 
 
f5ffee3
9bcc381
f5ffee3
667a76d
ffac43f
9bcc381
 
 
 
 
 
 
f5ffee3
9bcc381
 
 
 
 
 
 
 
 
 
 
 
f5ffee3
9bcc381
 
 
 
8927374
 
 
 
 
 
9bcc381
8927374
9bcc381
 
 
 
 
 
 
 
 
 
f5ffee3
9bcc381
 
 
 
 
 
 
 
8927374
9bcc381
 
 
8927374
9bcc381
 
 
 
 
 
 
 
 
 
 
 
f5ffee3
9bcc381
 
 
 
 
 
 
8927374
9bcc381
 
 
8927374
9bcc381
 
 
 
 
 
 
 
f5ffee3
9bcc381
 
 
 
 
 
 
8927374
9bcc381
 
 
8927374
9bcc381
 
 
 
 
 
 
 
 
 
 
 
f5ffee3
9bcc381
 
 
 
 
 
 
8927374
9bcc381
 
 
8927374
9bcc381
 
 
 
 
 
 
 
 
 
f5ffee3
9bcc381
 
 
 
 
 
 
8927374
9bcc381
 
 
8927374
9bcc381
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f5ffee3
9bcc381
 
 
 
ed46aad
9bcc381
ed46aad
f5ffee3
ed46aad
f5ffee3
9bcc381
 
 
 
 
 
ed46aad
 
 
f5ffee3
9bcc381
 
ac3ee09
9bcc381
f5ffee3
 
41400e2
 
 
 
9bcc381
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
---
license: llama2
pipeline_tag: text-generation
widget:
  - text: |
      <s>Rewrite the following toxic input into non-toxic version. Let's break the input down step by step to rewrite the non-toxic version. You should first think about the expanation of why the input text is toxic. Then generate the detoxic output. You must preserve the original meaning as much as possible.
      Input: 
inference:
  parameters:
    max_new_tokens: 200
language:
- en
---


# DetoxLLM-7B

<p align="center">
    <br>
    <img src="./green_llama.png" style="width: 10vw; min-width: 50px;" />
    <br>
<p>

</p>

This model card corresponds to the DetoxLLM-7B detoxification model based on [LLaMA-2](https://huggingface.co/meta-llama/Llama-2-7b). The model is finetuned with Chain-of-Thought (CoT) explanation.

**Paper**: [GreenLLaMA: A Framework for Detoxification with Explanations](https://arxiv.org/abs/2402.15951) **(EMNLP 2024 Main)**

**Authors**: Md Tawkat Islam Khondaker, Muhammad Abdul-Mageed, Laks V.S. Lakshmanan

## Model Information

Summary description and brief definition of inputs and outputs.

### Description

DetoxLLM is the first comprehensive end-to-end detoxification framework trained on cross-platform pseudo-parallel corpus. DetoxLLM further introduces explanation to promote transparency and trustworthiness. The framework also demonstrates robustness against adversarial toxicity.

### Usage

Below we share some code snippets on how to get quickly started with running the model. First make sure to `pip install -U transformers accelerate bitsandbytes`, then copy the snippet from the section that is relevant for your usecase.


#### Running the model on a CPU


```python
from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "UBC-NLP/DetoxLLM-7B"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

prompt = "Rewrite the following toxic input into non-toxic version. Let's break the input down step by step to rewrite the non-toxic version. You should first think about the expanation of why the input text is toxic. Then generate the detoxic output. You must preserve the original meaning as much as possible.\nInput: "

input = "Those shithead should stop talking and get the f*ck out of this place"
input_text = prompt+input+"\n"

input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")

outputs = model.generate(**input_ids, do_sample=False)
print(tokenizer.decode(outputs[0]))
```


#### Running the model on a single / multi GPU


```python
from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "UBC-NLP/DetoxLLM-7B"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")


prompt = "Rewrite the following toxic input into non-toxic version. Let's break the input down step by step to rewrite the non-toxic version. You should first think about the expanation of why the input text is toxic. Then generate the detoxic output. You must preserve the original meaning as much as possible.\nInput: "

input = "Those shithead should stop talking and get the f*ck out of this place"
input_text = prompt+input+"\n"

input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")

outputs = model.generate(**input_ids, do_sample=False)
print(tokenizer.decode(outputs[0]))
```


#### Running the model on a GPU using different precisions

* _Using `torch.float16`_

```python
# pip install accelerate
from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "UBC-NLP/DetoxLLM-7B"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", torch_dtype=torch.float16)

prompt = "Rewrite the following toxic input into non-toxic version. Let's break the input down step by step to rewrite the non-toxic version. You should first think about the expanation of why the input text is toxic. Then generate the detoxic output. You must preserve the original meaning as much as possible.\nInput: "

input = "Those shithead should stop talking and get the f*ck out of this place"
input_text = prompt+input+"\n"

input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")

outputs = model.generate(**input_ids, do_sample=False)
print(tokenizer.decode(outputs[0]))
```

* _Using `torch.bfloat16`_

```python
from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "UBC-NLP/DetoxLLM-7B"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", torch_dtype=torch.bfloat16)

prompt = "Rewrite the following toxic input into non-toxic version. Let's break the input down step by step to rewrite the non-toxic version. You should first think about the expanation of why the input text is toxic. Then generate the detoxic output. You must preserve the original meaning as much as possible.\nInput: "

input = "Those shithead should stop talking and get the f*ck out of this place"
input_text = prompt+input+"\n"

input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")

outputs = model.generate(**input_ids, do_sample=False)
print(tokenizer.decode(outputs[0]))
```

#### Quantized Versions through `bitsandbytes`

* _Using 8-bit precision (int8)_

```python
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig

quantization_config = BitsAndBytesConfig(load_in_8bit=True)

model_name = "UBC-NLP/DetoxLLM-7B"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, quantization_config=quantization_config)

prompt = "Rewrite the following toxic input into non-toxic version. Let's break the input down step by step to rewrite the non-toxic version. You should first think about the expanation of why the input text is toxic. Then generate the detoxic output. You must preserve the original meaning as much as possible.\nInput: "

input = "Those shithead should stop talking and get the f*ck out of this place"
input_text = prompt+input+"\n"

input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")

outputs = model.generate(**input_ids, do_sample=False)
print(tokenizer.decode(outputs[0]))
```

* _Using 4-bit precision_

```python
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig

quantization_config = BitsAndBytesConfig(load_in_4bit=True)

model_name = "UBC-NLP/DetoxLLM-7B"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, quantization_config=quantization_config)

prompt = "Rewrite the following toxic input into non-toxic version. Let's break the input down step by step to rewrite the non-toxic version. You should first think about the expanation of why the input text is toxic. Then generate the detoxic output. You must preserve the original meaning as much as possible.\nInput: "

input = "Those shithead should stop talking and get the f*ck out of this place"
input_text = prompt+input+"\n"

input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")

outputs = model.generate(**input_ids, do_sample=False)
print(tokenizer.decode(outputs[0]))
```



## Model Data

The model is trained on cross-platform pseudo-parallel detoxification corpus generated using ChatGPT.

## Usage and Limitations

These models have certain limitations that users should be aware of.

### Intended Usage

The intended use of DetoxLLM is for the detoxification tasks. We aim to help researchers to build an end-to-end complete detoxification framework. DetoxLLM can also be regarded as a promising baseline to develop more robust and effective detoxification frameworks.


### Limitations

* **Data Generation Process:**
  This work uses ChatGPT, a gpt-3.5-turbo version from June, 2023. Since the model can be updated on a regular interval, the data generation process should be treated accordingly.
* **Data Quality:**
  DetoxLLM proposes an automated data generation pipeline to create a pseudo-parallel cross-platform corpus. The synthetic data generation process involves multi-stage data processing without the necessity of direct human inspection. Although this automated pipeline makes the overall data generation process scalable, it comes at the risk of allowing low-quality data in our cross-platform corpus. Hence, human inspection is recommended to remove any sort of potential vulnerability and maintain a standard quality of the corpus.
* **Model Responses:** 
  Although DetoxLLM exhibits impressive ability in generating detoxified responses, we believe there is still room for improvement for the model in terms of producing meaning-preserved detoxified outcomes. Moreover, the models can sometimes be vulnerable to implicit, adversarial tokens and continue to produce toxic content. Therefore, we recommend that DetoxLLM should be couched with caution before deployment. 

### Ethical Considerations and Risks

The development of large language models (LLMs) raises several ethical concerns.
In creating an open model, we have carefully considered the following:

* **Data Collection and Release:** 
  We compile datasets from a wide range of platforms. To ensure proper credit assignment, we refer users to the original publications in our paper. We create the cross-platform detoxification corpus for academic research purposes. We intend to share the corpus in the future. We would also like to mention that some content are generated using GPT-4 for illustration purposes.
* **Potential Misuse and Bias:**
  GreenLLaMA can potentially be misused to generate toxic and biased content. For these reasons, we recommend that DetoxLLM not be used in applications without careful prior consideration of potential misuse and bias.

## Citation
If you use GreenLLaMA for your scientific publication, or if you find the resources in this repository useful, please cite our paper as follows:
```
@inproceedings{Khondaker2024DetoxLLM,
  title={DetoxLLM: A Framework for Detoxification with Explanations},
  author={Md. Tawkat Islam Khondaker and Muhammad Abdul-Mageed and Laks V. S. Lakshmanan},
  year={2024},
  url={https://arxiv.org/abs/2402.15951}
}

```