adamchanadam's picture
Upload README.md with huggingface_hub
7388b7f verified
|
raw
history blame
1.38 kB
metadata
language:
  - en
  - zh
tags:
  - florence-2
  - document-vqa
  - image-text-retrieval
  - fine-tuned
license: mit
base_model: microsoft/Florence-2-base-ft

adamchanadam/Test_Florence-2-FT-DocVQA

This model is fine-tuned from microsoft/Florence-2-base-ft for Document Visual Question Answering (DocVQA) tasks.

Model description

  • Fine-tuned for answering questions about images, specifically focused on logo recognition and company information.
  • The model uses the <DocVQA> prompt to indicate the task type.
  • Number of images: 2
  • Number of epochs: 15
  • Learning rate: 5e-06
  • Optimizer: AdamW
  • Early stopping: Patience of 3 epochs, delta of 0.01

Intended use & limitations

  • Use for answering questions about logos and company information in images
  • Performance may be limited for questions or image content not represented in the training data

Training procedure

  • Images were resized and normalized according to Florence-2's preprocessing standards.
  • The <DocVQA> prompt was used during fine-tuning to indicate the task type.
  • Questions and answers were provided for each image in the training set.
  • Batch size: 2
  • Evaluation metric: Cross-entropy loss on a held-out validation set

For more information, please contact the model creators.