File size: 1,377 Bytes
17fbd58 7388b7f bf26ff5 217e768 17fbd58 217e768 17fbd58 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 |
---
language:
- en
- zh
tags:
- florence-2
- document-vqa
- image-text-retrieval
- fine-tuned
license: mit
base_model: microsoft/Florence-2-base-ft
---
# adamchanadam/Test_Florence-2-FT-DocVQA
This model is fine-tuned from [microsoft/Florence-2-base-ft](https://huggingface.co/microsoft/Florence-2-base-ft) for Document Visual Question Answering (DocVQA) tasks.
## Model description
- Fine-tuned for answering questions about images, specifically focused on logo recognition and company information.
- The model uses the `<DocVQA>` prompt to indicate the task type.
- Number of images: 2
- Number of epochs: 15
- Learning rate: 5e-06
- Optimizer: AdamW
- Early stopping: Patience of 3 epochs, delta of 0.01
## Intended use & limitations
- Use for answering questions about logos and company information in images
- Performance may be limited for questions or image content not represented in the training data
## Training procedure
- Images were resized and normalized according to Florence-2's preprocessing standards.
- The `<DocVQA>` prompt was used during fine-tuning to indicate the task type.
- Questions and answers were provided for each image in the training set.
- Batch size: 2
- Evaluation metric: Cross-entropy loss on a held-out validation set
For more information, please contact the model creators.
|