ONNX and ORT models with quantization of google-bert/bert-large-cased-whole-word-masking-finetuned-squad

日本語READMEはこちら

This repository contains the ONNX and ORT formats of the model google-bert/bert-large-cased-whole-word-masking-finetuned-squad, along with quantized versions.

License

The license for this model is "apache-2.0". For details, please refer to the original model page: google-bert/bert-large-cased-whole-word-masking-finetuned-squad.

Usage

To use this model, install ONNX Runtime and perform inference as shown below.

# Example code
import onnxruntime as ort
import numpy as np
from transformers import AutoTokenizer
import os

# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained('google-bert/bert-large-cased-whole-word-masking-finetuned-squad')

# Prepare inputs
text = 'Replace this text with your input.'
inputs = tokenizer(text, return_tensors='np')

# Specify the model paths
# Test both the ONNX model and the ORT model
model_paths = [
    'onnx_models/model_opt.onnx',    # ONNX model
    'ort_models/model.ort'  # ORT format model
]

# Run inference with each model
for model_path in model_paths:
    print(f'\n===== Using model: {model_path} =====')
    # Get the model extension
    model_extension = os.path.splitext(model_path)[1]

    # Load the model
    if model_extension == '.ort':
        # Load the ORT format model
        session = ort.InferenceSession(model_path, providers=['CPUExecutionProvider'])
    else:
        # Load the ONNX model
        session = ort.InferenceSession(model_path)

    # Run inference
    outputs = session.run(None, dict(inputs))

    # Display the output shapes
    for idx, output in enumerate(outputs):
        print(f'Output {idx} shape: {output.shape}')

    # Display the results (add further processing if needed)
    print(outputs)

Contents of the Model

This repository includes the following models:

ONNX Models

onnx_models/model.onnx: Original ONNX model converted from google-bert/bert-large-cased-whole-word-masking-finetuned-squad
onnx_models/model_opt.onnx: Optimized ONNX model
onnx_models/model_fp16.onnx: FP16 quantized model
onnx_models/model_int8.onnx: INT8 quantized model
onnx_models/model_uint8.onnx: UINT8 quantized model

ORT Models

ort_models/model.ort: ORT model using the optimized ONNX model
ort_models/model_fp16.ort: ORT model using the FP16 quantized model
ort_models/model_int8.ort: ORT model using the INT8 quantized model
ort_models/model_uint8.ort: ORT model using the UINT8 quantized model

Notes

Please adhere to the license and usage conditions of the original model google-bert/bert-large-cased-whole-word-masking-finetuned-squad.

Contribution

If you find any issues or have improvements, please create an issue or submit a pull request.