Quantized-distilbert-banking77

This model is a dynamically quantized version of optimum/distilbert-base-uncased-finetuned-banking77 on the banking77 dataset.

The model was created using the dynamic-quantization notebook from a workshop presented at MLOps World 2022.

It achieves the following results on the evaluation set:

Accuracy

  • Vanilla model: 92.5%
  • Quantized model: 92.44%

The quantized model achieves 99.93% accuracy of the FP32 model

Latency

Payload sequence length: 128
Instance type: AWS c6i.xlarge

latency vanilla transformers quantized optimum model improvement
p95 63.24ms 37.06ms 1.71x
avg 62.87ms 37.93ms 1.66x

How to use

from optimum.onnxruntime import ORTModelForSequenceClassification
from transformers import pipeline, AutoTokenizer

model = ORTModelForSequenceClassification.from_pretrained("lewtun/quantized-distilbert-banking77")
tokenizer = AutoTokenizer.from_pretrained("lewtun/quantized-distilbert-banking77")

classifier = pipeline("text-classification", model=model, tokenizer=tokenizer)
classifier("What is the exchange rate like on this app?")
Downloads last month
14
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train lewtun/quantized-distilbert-banking77

Evaluation results