--- language: - en - zh tags: - florence-2 - document-vqa - image-text-retrieval - fine-tuned license: mit base_model: microsoft/Florence-2-base-ft --- # adamchanadam/Test_Florence-2-FT-DocVQA This model is fine-tuned from [microsoft/Florence-2-base-ft](https://huggingface.co/microsoft/Florence-2-base-ft) for Document Visual Question Answering (DocVQA) tasks. ## Model description - Fine-tuned for answering questions about images, specifically focused on logo recognition and company information. - The model uses the `` prompt to indicate the task type. - Number of unique images: 3 - Number of epochs: 15 - Learning rate: 5e-06 - Optimizer: AdamW - Early stopping: Patience of 3 epochs, delta of 0.01 Dataset statistics: Total number of questions for fine-tuning: 40. logo_recognition: 4 (10.00%) brand_identification: 4 (10.00%) visual_elements: 4 (10.00%) text_in_logo: 4 (10.00%) industry_classification: 4 (10.00%) product_service: 4 (10.00%) company_details: 6 (15.00%) negative_sample: 10 (25.00%) ## Intended use & limitations - Use for answering questions about logos and company information in images - Performance may be limited for questions or image content not represented in the training data ## Training procedure - Images were resized and normalized according to Florence-2's preprocessing standards. - The `` prompt was used during fine-tuning to indicate the task type. - Questions and answers were provided for each image in the training set. - Batch size: 2 - Evaluation metric: Cross-entropy loss on a held-out validation set For more information, please contact the model creators.