File size: 1,377 Bytes
17fbd58
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7388b7f
bf26ff5
217e768
17fbd58
 
 
 
 
 
 
 
 
 
 
 
 
217e768
17fbd58
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
---

language:
  - en
  - zh
tags:
  - florence-2
  - document-vqa
  - image-text-retrieval
  - fine-tuned
license: mit
base_model: microsoft/Florence-2-base-ft
---


# adamchanadam/Test_Florence-2-FT-DocVQA



This model is fine-tuned from [microsoft/Florence-2-base-ft](https://huggingface.co/microsoft/Florence-2-base-ft) for Document Visual Question Answering (DocVQA) tasks.



## Model description



- Fine-tuned for answering questions about images, specifically focused on logo recognition and company information.

- The model uses the `<DocVQA>` prompt to indicate the task type.

- Number of images: 2

- Number of epochs: 15

- Learning rate: 5e-06

- Optimizer: AdamW

- Early stopping: Patience of 3 epochs, delta of 0.01



## Intended use & limitations



- Use for answering questions about logos and company information in images

- Performance may be limited for questions or image content not represented in the training data



## Training procedure



- Images were resized and normalized according to Florence-2's preprocessing standards.

- The `<DocVQA>` prompt was used during fine-tuning to indicate the task type.

- Questions and answers were provided for each image in the training set.

- Batch size: 2

- Evaluation metric: Cross-entropy loss on a held-out validation set



For more information, please contact the model creators.