File size: 911 Bytes
a1b03c4 91c98bb 55a009b 91c98bb 899a920 48a14f0 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 |
---
license: openrail
---
<h3 align="center">PDF Paragraphs Extraction</h3>
<p align="center">A model for extracting paragraphs from PDFs</p>
This model uses features from the PDF to extract the text and paragraphs from it. It can be used as a service.
The paragraphs contain the page number, the position in the page, the size, and the text.
We have created the better and more flexible version of this service, you can check here:
https://huggingface.co/HURIDOCS/pdf-document-layout-analysis
## Quick Start
Download the service that uses the model:
git clone https://github.com/huridocs/pdf_paragraphs_extraction.git
cd pdf_paragraphs_extraction
Start the service:
./run start
Get the paragraphs from a PDF:
curl -X GET -F 'file=@/PATH/TO/PDF/pdf_name.pdf' localhost:5051
To stop the server:
./run stop
## Performance
Accuracy: 93.9%
Speed: 0.15 seconds per page |