Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
40.8
TFLOPS
17
6
269
Mel Massadian
melmass
Follow
21world's profile picture
greasebig's profile picture
2 followers
·
16 following
https://melmassadian.com
melmassadian
melMass
AI & ML interests
Building tools on top of Generative AI & LLM models
Recent Activity
liked
a model
2 days ago
hexgrad/Kokoro-82M
liked
a model
3 days ago
stabilityai/stable-point-aware-3d
reacted
to
merve
's
post
with 🔥
3 days ago
ByteDance just dropped SA2VA: a new family of vision LMs combining Qwen2VL/InternVL and SAM2 with MIT license 💗 https://huggingface.co/collections/ByteDance/sa2va-model-zoo-677e3084d71b5f108d00e093 > The models are capable of tasks involving vision-language understanding and visual referrals (referring segmentation) both for images and videos ⏯️ > The models come in 1B, 4B and 8B and are based on InternVL2.5 for base architecture and Qwen2, Qwen2.5 and InternLM2 for language model part (depending on the checkpoint) > The model is very interesting, it has different encoders for different modalities each (visual prompt, text prompt, image and video) then it concatenates these to feed into LLM 💬 the output segmentation tokens are passed to SAM2, to sort of match text (captions or semantic classes) to masks ⤵️ > Their annotation pipeline is also interesting, they seems to use two open large vision LMs to refine the annotations, and have different levels of descriptions to provide consistency.
View all activity
Organizations
models
5
Sort: Recently updated
melmass/VideoVAEPlus
Updated
20 days ago
melmass/FCVG
Updated
20 days ago
melmass/pytorch-scripts
Updated
Sep 21, 2024
melmass/audio-separation
Updated
Aug 23, 2024
melmass/sdxl_loras
Text-to-Image
•
Updated
Jun 18, 2024
•
2
datasets
None public yet