|
--- |
|
license: apache-2.0 |
|
language: |
|
- en |
|
tags: |
|
- text-classification |
|
- onnx |
|
- fp16 |
|
- roberta |
|
- toxicity |
|
- bias |
|
- multi-class-classification |
|
- multi-label-classification |
|
- optimum |
|
inference: false |
|
--- |
|
|
|
This model is a FP16 optimized version of [protectai/unbiased-toxic-roberta-onnx](https://huggingface.co/protectai/unbiased-toxic-roberta-onnx). It runs exclusively on the GPU. |
|
|
|
On an RTX 4090, it runs up to 2x faster than the base ONNX version. The speedup depends chiefly on your GPU's FP16:FP32 ratio. For more comparison benchmarks and sample code of a related model, check here: [https://github.com/joaopn/gpu_benchmark_goemotions](https://github.com/joaopn/gpu_benchmark_goemotions). |
|
|
|
|
|
### Usage |
|
|
|
The model was generated with |
|
|
|
```python |
|
from optimum.onnxruntime import ORTOptimizer, ORTModelForSequenceClassification, AutoOptimizationConfig |
|
|
|
model_id_onnx = "protectai/unbiased-toxic-roberta-onnx" |
|
file_name = "model.onnx" |
|
model = ORTModelForSequenceClassification.from_pretrained(model_id_onnx, file_name=file_name, provider="CUDAExecutionProvider", provider_options={'device_id': 0}) |
|
|
|
optimizer = ORTOptimizer.from_pretrained(model) |
|
optimization_config = AutoOptimizationConfig.O4() |
|
optimizer.optimize(save_dir='unbiased-toxic-roberta-onnx-fp16', optimization_config=optimization_config) |
|
``` |
|
|
|
You will need the GPU version of the ONNX Runtime. It can be installed with |
|
|
|
``` |
|
pip install optimum[onnxruntime-gpu] --extra-index-url https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/onnxruntime-cuda-12/pypi/simple/ |
|
``` |
|
|
|
For convenience, this [benchmark repo](https://github.com/joaopn/gpu_benchmark_goemotions) provides an `environment.yml` file to create a conda env with all the requirements. Below is an optimized, batched usage example: |
|
|
|
```python |
|
import pandas as pd |
|
import torch |
|
from tqdm import tqdm |
|
from transformers import AutoTokenizer |
|
from optimum.onnxruntime import ORTModelForSequenceClassification |
|
|
|
def sentiment_analysis_batched(df, batch_size, field_name): |
|
|
|
model_id = 'joaopn/unbiased-toxic-roberta-onnx-fp16' |
|
file_name = 'model.onnx' |
|
gpu_id = 0 |
|
|
|
model = ORTModelForSequenceClassification.from_pretrained(model_id, file_name=file_name, provider="CUDAExecutionProvider", provider_options={'device_id': gpu_id}) |
|
device = torch.device(f"cuda:{gpu_id}") |
|
|
|
tokenizer = AutoTokenizer.from_pretrained(model_id) |
|
|
|
results = [] |
|
|
|
# Precompute id2label mapping |
|
id2label = model.config.id2label |
|
|
|
total_samples = len(df) |
|
with tqdm(total=total_samples, desc="Processing samples") as pbar: |
|
for start_idx in range(0, total_samples, batch_size): |
|
end_idx = start_idx + batch_size |
|
texts = df[field_name].iloc[start_idx:end_idx].tolist() |
|
|
|
inputs = tokenizer(texts, padding=True, truncation=True, return_tensors="pt", max_length=512) |
|
input_ids = inputs['input_ids'].to(device) |
|
attention_mask = inputs['attention_mask'].to(device) |
|
|
|
with torch.no_grad(): |
|
outputs = model(input_ids, attention_mask=attention_mask) |
|
predictions = torch.sigmoid(outputs.logits) # Use sigmoid for multi-label classification |
|
|
|
# Collect predictions on GPU |
|
results.append(predictions) |
|
|
|
pbar.update(end_idx - start_idx) |
|
|
|
# Concatenate all results on GPU |
|
all_predictions = torch.cat(results, dim=0).cpu().numpy() |
|
|
|
# Convert to DataFrame |
|
predictions_df = pd.DataFrame(all_predictions, columns=[id2label[i] for i in range(all_predictions.shape[1])]) |
|
|
|
# Add prediction columns to the original DataFrame |
|
combined_df = pd.concat([df.reset_index(drop=True), predictions_df], axis=1) |
|
|
|
return combined_df |
|
|
|
df = pd.read_csv('https://github.com/joaopn/gpu_benchmark_goemotions/raw/main/data/random_sample_10k.csv.gz') |
|
df = sentiment_analysis_batched(df, batch_size=8, field_name='body') |
|
``` |