Spaces:
Runtime error
Runtime error
File size: 1,828 Bytes
94e735e |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 |
Florence-2 is a new vision foundation model by MSFT capable of a wide variety of tasks 🤯 Let's unpack! 🧶 Demo, models and more on the next one 🐣
![image_1](image_1.jpg)
This model is can handle tasks that vary from document understanding to semantic segmentation 🤩
[Demo](https://t.co/7YJZvjhw84) | [Collection](https://t.co/Ub7FGazDz1)
![image_2](image_2.jpg)
The difference from previous models is that the authors have compiled a dataset that consists of 126M images with 5.4B annotations labelled with their own data engine ↓↓
![image_3](image_3.jpg)
The dataset also offers more variety in annotations compared to other datasets, it has region level and image level annotations with more variety in semantic granularity as well!
![image_4](image_4.jpg)
The model is a similar architecture to previous models, an image encoder, a multimodality encoder with text decoder. The authors have compiled the multitask dataset with prompts for each task which makes the model trainable on multiple tasks 🤗
![image_5](image_5.jpg)
You also fine-tune this model on any task of choice, the authors also released different results on downstream tasks and report their results when un/freezing vision encoder 🤓📉
They have released fine-tuned models too, you can find them in the collection above 🤗
![image_6](image_6.jpg)
> [!TIP]
Ressources:
[Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks](https://arxiv.org/abs/2311.06242)
by Bin Xiao, Haiping Wu, Weijian Xu, Xiyang Dai, Houdong Hu, Yumao Lu, Michael Zeng, Ce Liu, Lu Yuan (2023)
[Hugging Face blog post](https://huggingface.co/blog/finetune-florence2)
> [!NOTE]
[Original tweet](https://twitter.com/mervenoyann/status/1803769866878623819) (June 20, 2024)
|