File size: 1,698 Bytes

17fbd58
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f9b2bea
553ef06
 
17fbd58
553ef06
17fbd58
f9b2bea
 
a5487b5
17fbd58
 
 
 
 
 
 
 
 
 
553ef06
17fbd58

---

language:
  - en
  - zh
tags:
  - florence-2
  - document-vqa
  - image-text-retrieval
  - fine-tuned
license: mit
base_model: microsoft/Florence-2-base-ft
---


# adamchanadam/Test_Florence-2-FT-DocVQA



This model is fine-tuned from [microsoft/Florence-2-base-ft](https://huggingface.co/microsoft/Florence-2-base-ft) for Document Visual Question Answering (DocVQA) tasks.



## Model description



- Fine-tuned for answering questions about images, specifically focused on logo recognition and company information.

- The model uses the `<DocVQA>` prompt to indicate the task type.

- Number of unique images: 28

- Number of epochs: 7

- Learning rate: 1e-06

- Optimizer: AdamW

- Early stopping: Patience of 2 epochs, delta of 0.0001



Dataset statistics: Total number of questions for fine-tuning: 560.

logo_recognition: 49 (8.75%) brand_identification: 48 (8.57%) visual_elements: 65 (11.61%) text_in_logo: 57 (10.18%) industry_classification: 49 (8.75%) product_service: 55 (9.82%) company_details: 89 (15.89%) negative_sample: 148 (26.43%)

## Intended use & limitations

- Use for answering questions about logos and company information in images
- Performance may be limited for questions or image content not represented in the training data

## Training procedure

- Images were resized and normalized according to Florence-2's preprocessing standards.
- The `<DocVQA>` prompt was used during fine-tuning to indicate the task type.
- Questions and answers were provided for each image in the training set.
- Batch size: 4
- Evaluation metric: Cross-entropy loss on a held-out validation set

For more information, please contact the model creators.