|
--- |
|
license: openrail |
|
--- |
|
|
|
|
|
<h3 align="center">PDF Paragraphs Extraction</h3> |
|
<p align="center">A model for extracting paragraphs from PDFs</p> |
|
|
|
This model uses features from the PDF to extract the text and paragraphs from it. It can be used as a service. |
|
|
|
The paragraphs contain the page number, the position in the page, the size, and the text. |
|
|
|
We have created the better and more flexible version of this service, you can check here: |
|
|
|
https://huggingface.co/HURIDOCS/pdf-document-layout-analysis |
|
|
|
|
|
## Quick Start |
|
|
|
Download the service that uses the model: |
|
|
|
git clone https://github.com/huridocs/pdf_paragraphs_extraction.git |
|
cd pdf_paragraphs_extraction |
|
|
|
Start the service: |
|
|
|
./run start |
|
|
|
Get the paragraphs from a PDF: |
|
|
|
curl -X GET -F 'file=@/PATH/TO/PDF/pdf_name.pdf' localhost:5051 |
|
|
|
To stop the server: |
|
|
|
./run stop |
|
|
|
|
|
## Performance |
|
|
|
Accuracy: 93.9% |
|
|
|
Speed: 0.15 seconds per page |