HURIDOCS
/

pdf-reading-order

Model card Files Files and versions Community

pdf-reading-order / README.md

ali6parmak's picture

Update README.md

5560bdc 12 months ago

|

1.29 kB

	---
	license: openrail
	---


	<h3 align="center">PDF Reading Order</h3>
	<p align="center">A model for determining the correct reading order of the PDF files.</p>

	This model uses features from a given PDF to determine it's correct reading order.



	## Quick Start

	This model originally working on our two other models, which are pdf-token-type and pdf-paragraphs-extraction.
	The reason for using paragraph extraction model here is to find & extract "figure" and "table" tokens and reduce the complexity of a given PDF page - since figures and tables are including lots of tokens.

	So, for our paragraph extraction model's details, you can refer to these links:

	https://huggingface.co/HURIDOCS/pdf-segmetation
	https://github.com/huridocs/pdf_paragraphs_extraction.git

	You can clone the repo via this link:

	https://github.com/huridocs/pdf-reading-order


	First, the candidate selector model selects the tokens that could be the next token.
	Then, we are passing the best 18 tokens to the reading order model that candidate selector model selected.
	Reading order model decides the final reading orders of the tokens.


	## Performance

	Test Accuracy : 16 Mistakes/11438 Labels (99.86%)
	Average Accuracy: 431 Mistakes/184995 Labels (99.77%)

	Speed: ~0.65 seconds per page.