Compatibility with olmOCR repo
Great work! Since you mention this is a "drop in replacement", can I "drop it in" to https://github.com/allenai/olmocr with the --model
arg for python -m olmocr.pipeline
? Figured I'd ask before trying as you mention changes to the amounts of metadata it wants to see, etc.
edit: I know you provide an example with vLLM but this would require rebuilding olmocr.pipeline
to have a CLI script I can point at a directory of PDF files
Great work! Since you mention this is a "drop in replacement", can I "drop it in" to https://github.com/allenai/olmocr with the
--model
arg forpython -m olmocr.pipeline
? Figured I'd ask before trying as you mention changes to the amounts of metadata it wants to see, etc.edit: I know you provide an example with vLLM but this would require rebuilding
olmocr.pipeline
to have a CLI script I can point at a directory of PDF files
Hi @pszemraj , the model should mostly be compatible with olmocr pipeline, but with some tweaks: the prompt is different (you might want to modify this: https://github.com/allenai/olmocr/blob/main/olmocr/prompts/prompts.py), and the model arch is now Qwen2.5-vl instead of Qwen2.0-vl. The rest of it should be the same.
Any follow up to this would be greatly appreciated @pszemraj @yifei-reducto
thanks
@yifei-reducto
! In the meantime I tried using the model with the original pipeline.py
with some updates such as manually forcing the prompts to be the same as the ones you specify, etc. I ran into some strange issues even after inference 'worked' like wild hallucinations/repeats etc, so I abandoned the original pipeline code/sglang and opted for your vLLM approach.
I workshopped async_pipeline.py in this gist with gemini-2.5 and it seems to work pretty well for batch inference.
- Don't quote me on this, but maybe even an order of magnitude faster than what I saw with the original (olmOCR) inference code.
Quick overview of the process:
- ensure you have vllm, flash-attn, other deps installed as needed (see script). flashinfer is nice to have but how to get it to install is out of scope here lol
- serve the model locally in a separate tmux/screen/terminal with
vllm serve reducto/RolmOCR
- after the endpoint is ready run
python async_pipeline.py --input_dir ./directory-of-pdfs
(output dir inferred/named based on input dir, or pass--output_dir ./out
)
PDFs are converted to images which are fired off async in batches of --concurrency_limit
for fast vLLM inference. Can''t claim the code to be fully optimal, but it works well enough based on my tests - hope this helps anyone reading!