Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos Paper • 2501.04001 • Published Jan 7 • 43
meta-llama/Llama-3.2-11B-Vision-Instruct Image-Text-to-Text • Updated Dec 4, 2024 • 1.32M • • 1.37k
view article Article PaliGemma – Google's Cutting-Edge Open Vision Language Model May 14, 2024 • 243
Moshi v0.1 Release Collection MLX, Candle & PyTorch model checkpoints released as part of the Moshi release from Kyutai. Run inference via: https://github.com/kyutai-labs/moshi • 13 items • Updated Sep 18, 2024 • 227
lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF Text Generation • Updated Jul 28, 2024 • 53.7k • 238