Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
1
Shakif
Shakif
Follow
0 followers
·
11 following
mohammed-shakeef-134977298
AI & ML interests
No code ai, ML, AI automation, Ai enthusiast
Recent Activity
reacted
to
merve
's
post
with 👍
25 days ago
ByteDance just dropped SA2VA: a new family of vision LMs combining Qwen2VL/InternVL and SAM2 with MIT license 💗 https://huggingface.co/collections/ByteDance/sa2va-model-zoo-677e3084d71b5f108d00e093 > The models are capable of tasks involving vision-language understanding and visual referrals (referring segmentation) both for images and videos ⏯️ > The models come in 1B, 4B and 8B and are based on InternVL2.5 for base architecture and Qwen2, Qwen2.5 and InternLM2 for language model part (depending on the checkpoint) > The model is very interesting, it has different encoders for different modalities each (visual prompt, text prompt, image and video) then it concatenates these to feed into LLM 💬 the output segmentation tokens are passed to SAM2, to sort of match text (captions or semantic classes) to masks ⤵️ > Their annotation pipeline is also interesting, they seems to use two open large vision LMs to refine the annotations, and have different levels of descriptions to provide consistency.
upvoted
a
collection
25 days ago
Sa2VA model zoo
View all activity
Organizations
None yet
Shakif
's activity
All
Models
Datasets
Spaces
Papers
Collections
Community
Posts
Upvotes
Likes
upvoted
a
collection
25 days ago
Sa2VA model zoo
Collection
4 items
•
Updated
21 days ago
•
28