--- license: apache-2.0 language: - en tags: - text-classification - onnx - fp16 - roberta - toxicity - bias - multi-class-classification - multi-label-classification - optimum inference: false --- This model is a FP16 optimized version of [protectai/unbiased-toxic-roberta-onnx](https://huggingface.co/protectai/unbiased-toxic-roberta-onnx). It runs exclusively on the GPU. On an RTX 4090, it runs up to 2x faster than the base ONNX version. The speedup depends chiefly on your GPU's FP16:FP32 ratio. For more comparison benchmarks and sample code of a related model, check here: [https://github.com/joaopn/gpu_benchmark_goemotions](https://github.com/joaopn/gpu_benchmark_goemotions). ### Usage The model was generated with ```python from optimum.onnxruntime import ORTOptimizer, ORTModelForSequenceClassification, AutoOptimizationConfig model_id_onnx = "protectai/unbiased-toxic-roberta-onnx" file_name = "model.onnx" model = ORTModelForSequenceClassification.from_pretrained(model_id_onnx, file_name=file_name, provider="CUDAExecutionProvider", provider_options={'device_id': 0}) optimizer = ORTOptimizer.from_pretrained(model) optimization_config = AutoOptimizationConfig.O4() optimizer.optimize(save_dir='unbiased-toxic-roberta-onnx-fp16', optimization_config=optimization_config) ``` You will need the GPU version of the ONNX Runtime. It can be installed with ``` pip install optimum[onnxruntime-gpu] --extra-index-url https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/onnxruntime-cuda-12/pypi/simple/ ``` For convenience, this [benchmark repo](https://github.com/joaopn/gpu_benchmark_goemotions) provides an `environment.yml` file to create a conda env with all the requirements. Below is an optimized, batched usage example: ```python import pandas as pd import torch from tqdm import tqdm from transformers import AutoTokenizer from optimum.onnxruntime import ORTModelForSequenceClassification def sentiment_analysis_batched(df, batch_size, field_name): model_id = 'joaopn/unbiased-toxic-roberta-onnx-fp16' file_name = 'model.onnx' gpu_id = 0 model = ORTModelForSequenceClassification.from_pretrained(model_id, file_name=file_name, provider="CUDAExecutionProvider", provider_options={'device_id': gpu_id}) device = torch.device(f"cuda:{gpu_id}") tokenizer = AutoTokenizer.from_pretrained(model_id) results = [] # Precompute id2label mapping id2label = model.config.id2label total_samples = len(df) with tqdm(total=total_samples, desc="Processing samples") as pbar: for start_idx in range(0, total_samples, batch_size): end_idx = start_idx + batch_size texts = df[field_name].iloc[start_idx:end_idx].tolist() inputs = tokenizer(texts, padding=True, truncation=True, return_tensors="pt", max_length=512) input_ids = inputs['input_ids'].to(device) attention_mask = inputs['attention_mask'].to(device) with torch.no_grad(): outputs = model(input_ids, attention_mask=attention_mask) predictions = torch.sigmoid(outputs.logits) # Use sigmoid for multi-label classification # Collect predictions on GPU results.append(predictions) pbar.update(end_idx - start_idx) # Concatenate all results on GPU all_predictions = torch.cat(results, dim=0).cpu().numpy() # Convert to DataFrame predictions_df = pd.DataFrame(all_predictions, columns=[id2label[i] for i in range(all_predictions.shape[1])]) # Add prediction columns to the original DataFrame combined_df = pd.concat([df.reset_index(drop=True), predictions_df], axis=1) return combined_df df = pd.read_csv('https://github.com/joaopn/gpu_benchmark_goemotions/raw/main/data/random_sample_10k.csv.gz') df = sentiment_analysis_batched(df, batch_size=8, field_name='body') ```