--- license: openrail ---
A model for extracting paragraphs from PDFs
This model uses features from the PDF to extract the text and paragraphs from it. It can be used as a service. The paragraphs contain the page number, the position in the page, the size, and the text. We have created the better and more flexible version of this service, you can check here: https://huggingface.co/HURIDOCS/pdf-document-layout-analysis ## Quick Start Download the service that uses the model: git clone https://github.com/huridocs/pdf_paragraphs_extraction.git cd pdf_paragraphs_extraction Start the service: ./run start Get the paragraphs from a PDF: curl -X GET -F 'file=@/PATH/TO/PDF/pdf_name.pdf' localhost:5051 To stop the server: ./run stop ## Performance Accuracy: 93.9% Speed: 0.15 seconds per page