joaopn commited on
Commit
4565769
·
verified ·
1 Parent(s): 505752a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +102 -3
README.md CHANGED
@@ -1,3 +1,102 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ tags:
6
+ - text-classification
7
+ - onnx
8
+ - fp16
9
+ - roberta
10
+ - toxicity
11
+ - bias
12
+ - multi-class-classification
13
+ - multi-label-classification
14
+ - optimum
15
+ inference: false
16
+ ---
17
+
18
+ This model is a FP16 optimized version of [protectai/unbiased-toxic-roberta-onnx](https://huggingface.co/protectai/unbiased-toxic-roberta-onnx). It runs exclusively on the GPU.
19
+
20
+ On an RTX 4090, it runs up to 2x faster than the base ONNX version. The speedup depends chiefly on your GPU's FP16:FP32 ratio. For more comparison benchmarks and sample code of a related model, check here: [https://github.com/joaopn/gpu_benchmark_goemotions](https://github.com/joaopn/gpu_benchmark_goemotions).
21
+
22
+
23
+ ### Usage
24
+
25
+ The model was generated with
26
+
27
+ ```python
28
+ from optimum.onnxruntime import ORTOptimizer, ORTModelForSequenceClassification, AutoOptimizationConfig
29
+
30
+ model_id_onnx = "protectai/unbiased-toxic-roberta-onnx"
31
+ file_name = "model.onnx"
32
+ model = ORTModelForSequenceClassification.from_pretrained(model_id_onnx, file_name=file_name, provider="CUDAExecutionProvider", provider_options={'device_id': 0})
33
+
34
+ optimizer = ORTOptimizer.from_pretrained(model)
35
+ optimization_config = AutoOptimizationConfig.O4()
36
+ optimizer.optimize(save_dir='unbiased-toxic-roberta-onnx-fp16', optimization_config=optimization_config)
37
+ ```
38
+
39
+ You will need the GPU version of the ONNX Runtime. It can be installed with
40
+
41
+ ```
42
+ pip install optimum[onnxruntime-gpu] --extra-index-url https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/onnxruntime-cuda-12/pypi/simple/
43
+ ```
44
+
45
+ For convenience, this [benchmark repo](https://github.com/joaopn/gpu_benchmark_goemotions) provides an `environment.yml` file to create a conda env with all the requirements. Below is an optimized, batched usage example:
46
+
47
+ ```python
48
+ import pandas as pd
49
+ import torch
50
+ from tqdm import tqdm
51
+ from transformers import AutoTokenizer
52
+ from optimum.onnxruntime import ORTModelForSequenceClassification
53
+
54
+ def sentiment_analysis_batched(df, batch_size, field_name):
55
+
56
+ model_id = 'joaopn/unbiased-toxic-roberta-onnx-fp16'
57
+ file_name = 'model.onnx'
58
+ gpu_id = 0
59
+
60
+ model = ORTModelForSequenceClassification.from_pretrained(model_id, file_name=file_name, provider="CUDAExecutionProvider", provider_options={'device_id': gpu_id})
61
+ device = torch.device(f"cuda:{gpu_id}")
62
+
63
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
64
+
65
+ results = []
66
+
67
+ # Precompute id2label mapping
68
+ id2label = model.config.id2label
69
+
70
+ total_samples = len(df)
71
+ with tqdm(total=total_samples, desc="Processing samples") as pbar:
72
+ for start_idx in range(0, total_samples, batch_size):
73
+ end_idx = start_idx + batch_size
74
+ texts = df[field_name].iloc[start_idx:end_idx].tolist()
75
+
76
+ inputs = tokenizer(texts, padding=True, truncation=True, return_tensors="pt", max_length=512)
77
+ input_ids = inputs['input_ids'].to(device)
78
+ attention_mask = inputs['attention_mask'].to(device)
79
+
80
+ with torch.no_grad():
81
+ outputs = model(input_ids, attention_mask=attention_mask)
82
+ predictions = torch.sigmoid(outputs.logits) # Use sigmoid for multi-label classification
83
+
84
+ # Collect predictions on GPU
85
+ results.append(predictions)
86
+
87
+ pbar.update(end_idx - start_idx)
88
+
89
+ # Concatenate all results on GPU
90
+ all_predictions = torch.cat(results, dim=0).cpu().numpy()
91
+
92
+ # Convert to DataFrame
93
+ predictions_df = pd.DataFrame(all_predictions, columns=[id2label[i] for i in range(all_predictions.shape[1])])
94
+
95
+ # Add prediction columns to the original DataFrame
96
+ combined_df = pd.concat([df.reset_index(drop=True), predictions_df], axis=1)
97
+
98
+ return combined_df
99
+
100
+ df = pd.read_csv('https://github.com/joaopn/gpu_benchmark_goemotions/raw/main/data/random_sample_10k.csv.gz')
101
+ df = sentiment_analysis_batched(df, batch_size=8, field_name='body')
102
+ ```