ONNX and ORT models with quantization of google-bert/bert-large-cased-whole-word-masking-finetuned-squad

日本語READMEはこちら

This repository contains the ONNX and ORT formats of the model google-bert/bert-large-cased-whole-word-masking-finetuned-squad, along with quantized versions.

License

The license for this model is "apache-2.0". For details, please refer to the original model page: google-bert/bert-large-cased-whole-word-masking-finetuned-squad.

Usage

To use this model, install ONNX Runtime and perform inference as shown below.

# Example code
import onnxruntime as ort
import numpy as np
from transformers import AutoTokenizer
import os

# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained('google-bert/bert-large-cased-whole-word-masking-finetuned-squad')

# Prepare inputs
text = 'Replace this text with your input.'
inputs = tokenizer(text, return_tensors='np')

# Specify the model paths
# Test both the ONNX model and the ORT model
model_paths = [
    'onnx_models/model_opt.onnx',    # ONNX model
    'ort_models/model.ort'  # ORT format model
]

# Run inference with each model
for model_path in model_paths:
    print(f'\n===== Using model: {model_path} =====')
    # Get the model extension
    model_extension = os.path.splitext(model_path)[1]

    # Load the model
    if model_extension == '.ort':
        # Load the ORT format model
        session = ort.InferenceSession(model_path, providers=['CPUExecutionProvider'])
    else:
        # Load the ONNX model
        session = ort.InferenceSession(model_path)

    # Run inference
    outputs = session.run(None, dict(inputs))

    # Display the output shapes
    for idx, output in enumerate(outputs):
        print(f'Output {idx} shape: {output.shape}')

    # Display the results (add further processing if needed)
    print(outputs)

Contents of the Model

This repository includes the following models:

ONNX Models

  • onnx_models/model.onnx: Original ONNX model converted from google-bert/bert-large-cased-whole-word-masking-finetuned-squad
  • onnx_models/model_opt.onnx: Optimized ONNX model
  • onnx_models/model_fp16.onnx: FP16 quantized model
  • onnx_models/model_int8.onnx: INT8 quantized model
  • onnx_models/model_uint8.onnx: UINT8 quantized model

ORT Models

  • ort_models/model.ort: ORT model using the optimized ONNX model
  • ort_models/model_fp16.ort: ORT model using the FP16 quantized model
  • ort_models/model_int8.ort: ORT model using the INT8 quantized model
  • ort_models/model_uint8.ort: ORT model using the UINT8 quantized model

Notes

Please adhere to the license and usage conditions of the original model google-bert/bert-large-cased-whole-word-masking-finetuned-squad.

Contribution

If you find any issues or have improvements, please create an issue or submit a pull request.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.