Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,102 @@
|
|
1 |
-
---
|
2 |
-
license: apache-2.0
|
3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: apache-2.0
|
3 |
+
language:
|
4 |
+
- en
|
5 |
+
tags:
|
6 |
+
- text-classification
|
7 |
+
- onnx
|
8 |
+
- fp16
|
9 |
+
- roberta
|
10 |
+
- toxicity
|
11 |
+
- bias
|
12 |
+
- multi-class-classification
|
13 |
+
- multi-label-classification
|
14 |
+
- optimum
|
15 |
+
inference: false
|
16 |
+
---
|
17 |
+
|
18 |
+
This model is a FP16 optimized version of [protectai/unbiased-toxic-roberta-onnx](https://huggingface.co/protectai/unbiased-toxic-roberta-onnx). It runs exclusively on the GPU.
|
19 |
+
|
20 |
+
On an RTX 4090, it runs up to 2x faster than the base ONNX version. The speedup depends chiefly on your GPU's FP16:FP32 ratio. For more comparison benchmarks and sample code of a related model, check here: [https://github.com/joaopn/gpu_benchmark_goemotions](https://github.com/joaopn/gpu_benchmark_goemotions).
|
21 |
+
|
22 |
+
|
23 |
+
### Usage
|
24 |
+
|
25 |
+
The model was generated with
|
26 |
+
|
27 |
+
```python
|
28 |
+
from optimum.onnxruntime import ORTOptimizer, ORTModelForSequenceClassification, AutoOptimizationConfig
|
29 |
+
|
30 |
+
model_id_onnx = "protectai/unbiased-toxic-roberta-onnx"
|
31 |
+
file_name = "model.onnx"
|
32 |
+
model = ORTModelForSequenceClassification.from_pretrained(model_id_onnx, file_name=file_name, provider="CUDAExecutionProvider", provider_options={'device_id': 0})
|
33 |
+
|
34 |
+
optimizer = ORTOptimizer.from_pretrained(model)
|
35 |
+
optimization_config = AutoOptimizationConfig.O4()
|
36 |
+
optimizer.optimize(save_dir='unbiased-toxic-roberta-onnx-fp16', optimization_config=optimization_config)
|
37 |
+
```
|
38 |
+
|
39 |
+
You will need the GPU version of the ONNX Runtime. It can be installed with
|
40 |
+
|
41 |
+
```
|
42 |
+
pip install optimum[onnxruntime-gpu] --extra-index-url https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/onnxruntime-cuda-12/pypi/simple/
|
43 |
+
```
|
44 |
+
|
45 |
+
For convenience, this [benchmark repo](https://github.com/joaopn/gpu_benchmark_goemotions) provides an `environment.yml` file to create a conda env with all the requirements. Below is an optimized, batched usage example:
|
46 |
+
|
47 |
+
```python
|
48 |
+
import pandas as pd
|
49 |
+
import torch
|
50 |
+
from tqdm import tqdm
|
51 |
+
from transformers import AutoTokenizer
|
52 |
+
from optimum.onnxruntime import ORTModelForSequenceClassification
|
53 |
+
|
54 |
+
def sentiment_analysis_batched(df, batch_size, field_name):
|
55 |
+
|
56 |
+
model_id = 'joaopn/unbiased-toxic-roberta-onnx-fp16'
|
57 |
+
file_name = 'model.onnx'
|
58 |
+
gpu_id = 0
|
59 |
+
|
60 |
+
model = ORTModelForSequenceClassification.from_pretrained(model_id, file_name=file_name, provider="CUDAExecutionProvider", provider_options={'device_id': gpu_id})
|
61 |
+
device = torch.device(f"cuda:{gpu_id}")
|
62 |
+
|
63 |
+
tokenizer = AutoTokenizer.from_pretrained(model_id)
|
64 |
+
|
65 |
+
results = []
|
66 |
+
|
67 |
+
# Precompute id2label mapping
|
68 |
+
id2label = model.config.id2label
|
69 |
+
|
70 |
+
total_samples = len(df)
|
71 |
+
with tqdm(total=total_samples, desc="Processing samples") as pbar:
|
72 |
+
for start_idx in range(0, total_samples, batch_size):
|
73 |
+
end_idx = start_idx + batch_size
|
74 |
+
texts = df[field_name].iloc[start_idx:end_idx].tolist()
|
75 |
+
|
76 |
+
inputs = tokenizer(texts, padding=True, truncation=True, return_tensors="pt", max_length=512)
|
77 |
+
input_ids = inputs['input_ids'].to(device)
|
78 |
+
attention_mask = inputs['attention_mask'].to(device)
|
79 |
+
|
80 |
+
with torch.no_grad():
|
81 |
+
outputs = model(input_ids, attention_mask=attention_mask)
|
82 |
+
predictions = torch.sigmoid(outputs.logits) # Use sigmoid for multi-label classification
|
83 |
+
|
84 |
+
# Collect predictions on GPU
|
85 |
+
results.append(predictions)
|
86 |
+
|
87 |
+
pbar.update(end_idx - start_idx)
|
88 |
+
|
89 |
+
# Concatenate all results on GPU
|
90 |
+
all_predictions = torch.cat(results, dim=0).cpu().numpy()
|
91 |
+
|
92 |
+
# Convert to DataFrame
|
93 |
+
predictions_df = pd.DataFrame(all_predictions, columns=[id2label[i] for i in range(all_predictions.shape[1])])
|
94 |
+
|
95 |
+
# Add prediction columns to the original DataFrame
|
96 |
+
combined_df = pd.concat([df.reset_index(drop=True), predictions_df], axis=1)
|
97 |
+
|
98 |
+
return combined_df
|
99 |
+
|
100 |
+
df = pd.read_csv('https://github.com/joaopn/gpu_benchmark_goemotions/raw/main/data/random_sample_10k.csv.gz')
|
101 |
+
df = sentiment_analysis_batched(df, batch_size=8, field_name='body')
|
102 |
+
```
|