adamchanadam
/

Test_Florence-2-FT-DocVQA

Text Generation

image-text-retrieval

Model card Files Files and versions Community

Test_Florence-2-FT-DocVQA / README.md

adamchanadam's picture

Upload README.md with huggingface_hub

7388b7f verified 4 months ago

|

1.38 kB

	---
	language:
	- en
	- zh
	tags:
	- florence-2
	- document-vqa
	- image-text-retrieval
	- fine-tuned
	license: mit
	base_model: microsoft/Florence-2-base-ft
	---

	# adamchanadam/Test_Florence-2-FT-DocVQA

	This model is fine-tuned from [microsoft/Florence-2-base-ft](https://huggingface.co/microsoft/Florence-2-base-ft) for Document Visual Question Answering (DocVQA) tasks.

	## Model description

	- Fine-tuned for answering questions about images, specifically focused on logo recognition and company information.
	- The model uses the `<DocVQA>` prompt to indicate the task type.
	- Number of images: 2
	- Number of epochs: 15
	- Learning rate: 5e-06
	- Optimizer: AdamW
	- Early stopping: Patience of 3 epochs, delta of 0.01

	## Intended use & limitations

	- Use for answering questions about logos and company information in images
	- Performance may be limited for questions or image content not represented in the training data

	## Training procedure

	- Images were resized and normalized according to Florence-2's preprocessing standards.
	- The `<DocVQA>` prompt was used during fine-tuning to indicate the task type.
	- Questions and answers were provided for each image in the training set.
	- Batch size: 2
	- Evaluation metric: Cross-entropy loss on a held-out validation set

	For more information, please contact the model creators.