joaopn
/

unbiased-toxic-roberta-onnx-fp16

+---
+license: apache-2.0
+language:
+- en
+tags:
+- text-classification
+- onnx
+- fp16
+- roberta
+- toxicity
+- bias
+- multi-class-classification
+- multi-label-classification
+- optimum
+inference: false
+---
+This model is a FP16 optimized version of [protectai/unbiased-toxic-roberta-onnx](https://huggingface.co/protectai/unbiased-toxic-roberta-onnx). It runs exclusively on the GPU.
+On an RTX 4090, it runs up to 2x faster than the base ONNX version. The speedup depends chiefly on your GPU's FP16:FP32 ratio. For more comparison benchmarks and sample code of a related model, check here: [https://github.com/joaopn/gpu_benchmark_goemotions](https://github.com/joaopn/gpu_benchmark_goemotions).
+### Usage
+The model was generated with
+```python
+from optimum.onnxruntime import ORTOptimizer, ORTModelForSequenceClassification, AutoOptimizationConfig
+model_id_onnx = "protectai/unbiased-toxic-roberta-onnx"
+file_name = "model.onnx"
+model = ORTModelForSequenceClassification.from_pretrained(model_id_onnx, file_name=file_name, provider="CUDAExecutionProvider", provider_options={'device_id': 0})
+optimizer = ORTOptimizer.from_pretrained(model)
+optimization_config = AutoOptimizationConfig.O4()
+optimizer.optimize(save_dir='unbiased-toxic-roberta-onnx-fp16', optimization_config=optimization_config)
+```
+You will need the GPU version of the ONNX Runtime. It can be installed with
+```
+pip install optimum[onnxruntime-gpu] --extra-index-url https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/onnxruntime-cuda-12/pypi/simple/
+```
+For convenience, this [benchmark repo](https://github.com/joaopn/gpu_benchmark_goemotions) provides an `environment.yml` file to create a conda env with all the requirements. Below is an optimized, batched usage example:
+```python
+import pandas as pd
+import torch
+from tqdm import tqdm
+from transformers import AutoTokenizer
+from optimum.onnxruntime import ORTModelForSequenceClassification
+def sentiment_analysis_batched(df, batch_size, field_name):
+    model_id = 'joaopn/unbiased-toxic-roberta-onnx-fp16'
+    file_name = 'model.onnx'
+    gpu_id = 0
+    model = ORTModelForSequenceClassification.from_pretrained(model_id, file_name=file_name, provider="CUDAExecutionProvider", provider_options={'device_id': gpu_id})
+    device = torch.device(f"cuda:{gpu_id}")
+    tokenizer = AutoTokenizer.from_pretrained(model_id)
+    results = []
+    # Precompute id2label mapping
+    id2label = model.config.id2label
+    total_samples = len(df)
+    with tqdm(total=total_samples, desc="Processing samples") as pbar:
+        for start_idx in range(0, total_samples, batch_size):
+            end_idx = start_idx + batch_size
+            texts = df[field_name].iloc[start_idx:end_idx].tolist()
+            inputs = tokenizer(texts, padding=True, truncation=True, return_tensors="pt", max_length=512)
+            input_ids = inputs['input_ids'].to(device)
+            attention_mask = inputs['attention_mask'].to(device)
+            with torch.no_grad():
+                outputs = model(input_ids, attention_mask=attention_mask)
+            predictions = torch.sigmoid(outputs.logits)  # Use sigmoid for multi-label classification
+            # Collect predictions on GPU
+            results.append(predictions)
+            pbar.update(end_idx - start_idx)
+    # Concatenate all results on GPU
+    all_predictions = torch.cat(results, dim=0).cpu().numpy()
+    # Convert to DataFrame
+    predictions_df = pd.DataFrame(all_predictions, columns=[id2label[i] for i in range(all_predictions.shape[1])])
+    # Add prediction columns to the original DataFrame
+    combined_df = pd.concat([df.reset_index(drop=True), predictions_df], axis=1)
+    return combined_df
+df = pd.read_csv('https://github.com/joaopn/gpu_benchmark_goemotions/raw/main/data/random_sample_10k.csv.gz')
+df = sentiment_analysis_batched(df, batch_size=8, field_name='body')
+```