HURIDOCS
/

pdf-segmentation

Model card Files Files and versions Community

pdf-segmentation / README.md

ali6parmak's picture

Update README.md

55a009b verified 6 months ago

|

911 Bytes

	---
	license: openrail
	---


	<h3 align="center">PDF Paragraphs Extraction</h3>
	<p align="center">A model for extracting paragraphs from PDFs</p>

	This model uses features from the PDF to extract the text and paragraphs from it. It can be used as a service.

	The paragraphs contain the page number, the position in the page, the size, and the text.

	We have created the better and more flexible version of this service, you can check here:

	https://huggingface.co/HURIDOCS/pdf-document-layout-analysis


	## Quick Start

	Download the service that uses the model:

	git clone https://github.com/huridocs/pdf_paragraphs_extraction.git
	cd pdf_paragraphs_extraction

	Start the service:

	./run start

	Get the paragraphs from a PDF:

	curl -X GET -F 'file=@/PATH/TO/PDF/pdf_name.pdf' localhost:5051

	To stop the server:

	./run stop


	## Performance

	Accuracy: 93.9%

	Speed: 0.15 seconds per page