pages/OneFormer/OneFormer.md · velaia/vision

OneFormer: one model to segment them all? 🤯
I was looking into paperswithcode leaderboards when I came across OneFormer for the first time so it was time to dig in!

OneFormer is a "truly universal" model for semantic, instance and panoptic segmentation tasks ⚔️
What makes is truly universal is that it's a single model that is trained only once and can be used across all tasks 👇

The enabler here is the text conditioning, i.e. the model is given a text query that states task type along with the appropriate input, and using contrastive loss, the model learns the difference between different task types 👇

Thanks to 🤗 Transformers, you can easily use the model! I have drafted a notebook for you to try right away 😊
You can also check out the Space without checking out the code itself

Ressources:
OneFormer: One Transformer to Rule Universal Image Segmentation by Jitesh Jain, Jiachen Li, MangTik Chiu, Ali Hassani, Nikita Orlov, Humphrey Shi (2022) GitHub
Hugging Face documentation

Original tweet (December 26, 2023)