File size: 3,885 Bytes
4565769
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
---
license: apache-2.0
language:
- en
tags:
- text-classification
- onnx
- fp16
- roberta
- toxicity
- bias
- multi-class-classification
- multi-label-classification
- optimum
inference: false
---

This model is a FP16 optimized version of [protectai/unbiased-toxic-roberta-onnx](https://huggingface.co/protectai/unbiased-toxic-roberta-onnx). It runs exclusively on the GPU. 

On an RTX 4090, it runs up to 2x faster than the base ONNX version. The speedup depends chiefly on your GPU's FP16:FP32 ratio. For more comparison benchmarks and sample code of a related model, check here: [https://github.com/joaopn/gpu_benchmark_goemotions](https://github.com/joaopn/gpu_benchmark_goemotions).


### Usage

The model was generated with

```python
from optimum.onnxruntime import ORTOptimizer, ORTModelForSequenceClassification, AutoOptimizationConfig

model_id_onnx = "protectai/unbiased-toxic-roberta-onnx"
file_name = "model.onnx"
model = ORTModelForSequenceClassification.from_pretrained(model_id_onnx, file_name=file_name, provider="CUDAExecutionProvider", provider_options={'device_id': 0})

optimizer = ORTOptimizer.from_pretrained(model)
optimization_config = AutoOptimizationConfig.O4()
optimizer.optimize(save_dir='unbiased-toxic-roberta-onnx-fp16', optimization_config=optimization_config)
```

You will need the GPU version of the ONNX Runtime. It can be installed with

```
pip install optimum[onnxruntime-gpu] --extra-index-url https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/onnxruntime-cuda-12/pypi/simple/
```

For convenience, this [benchmark repo](https://github.com/joaopn/gpu_benchmark_goemotions) provides an `environment.yml` file to create a conda env with all the requirements. Below is an optimized, batched usage example:

```python
import pandas as pd
import torch
from tqdm import tqdm
from transformers import AutoTokenizer
from optimum.onnxruntime import ORTModelForSequenceClassification

def sentiment_analysis_batched(df, batch_size, field_name):

    model_id = 'joaopn/unbiased-toxic-roberta-onnx-fp16'
    file_name = 'model.onnx'
    gpu_id = 0
    
    model = ORTModelForSequenceClassification.from_pretrained(model_id, file_name=file_name, provider="CUDAExecutionProvider", provider_options={'device_id': gpu_id})
    device = torch.device(f"cuda:{gpu_id}")

    tokenizer = AutoTokenizer.from_pretrained(model_id)

    results = []

    # Precompute id2label mapping
    id2label = model.config.id2label

    total_samples = len(df)
    with tqdm(total=total_samples, desc="Processing samples") as pbar:
        for start_idx in range(0, total_samples, batch_size):
            end_idx = start_idx + batch_size
            texts = df[field_name].iloc[start_idx:end_idx].tolist()

            inputs = tokenizer(texts, padding=True, truncation=True, return_tensors="pt", max_length=512)
            input_ids = inputs['input_ids'].to(device)
            attention_mask = inputs['attention_mask'].to(device)

            with torch.no_grad():
                outputs = model(input_ids, attention_mask=attention_mask)
            predictions = torch.sigmoid(outputs.logits)  # Use sigmoid for multi-label classification

            # Collect predictions on GPU
            results.append(predictions)

            pbar.update(end_idx - start_idx)

    # Concatenate all results on GPU
    all_predictions = torch.cat(results, dim=0).cpu().numpy()

    # Convert to DataFrame
    predictions_df = pd.DataFrame(all_predictions, columns=[id2label[i] for i in range(all_predictions.shape[1])])

    # Add prediction columns to the original DataFrame
    combined_df = pd.concat([df.reset_index(drop=True), predictions_df], axis=1)

    return combined_df

df = pd.read_csv('https://github.com/joaopn/gpu_benchmark_goemotions/raw/main/data/random_sample_10k.csv.gz')
df = sentiment_analysis_batched(df, batch_size=8, field_name='body')
```