nougat-small onnx

https://huggingface.co/facebook/nougat-small but exported to onnx. This is not quantized.

from transformers import NougatProcessor
from optimum.onnxruntime import ORTModelForVision2Seq

model_name = 'pszemraj/nougat-small-onnx'
processor = NougatProcessor.from_pretrained(model_name)
model = ORTModelForVision2Seq.from_pretrained(
    model_name,
    provider="CPUExecutionProvider", # 'CUDAExecutionProvider' for gpu 
    use_merged=False,
    use_io_binding=True
)

on colab CPU-only (at time of writing) you may get CuPy errors, to solve this uninstall it:

pip uninstall cupy-cuda11x -y

how do da inference?

See here or this basic notebook I uploaded. It seems ONNX brings CPU inference times to 'feasible' - it took ~15 mins for Attention is All You Meme on Colab free CPU runtime.

Downloads last month
14
Inference API
Inference API (serverless) does not yet support transformers models for this pipeline type.

Collection including pszemraj/nougat-small-onnx