146 15 408

Joseph

Joseph717171

AI & ML interests

None yet

Recent Activity

new activity 2 days ago

google/gemma-3-12b-it-qat-q4_0-gguf:Permission to access the Unquantized versions of the QAT weights for Gemma-3

liked a model 2 days ago

reducto/RolmOCR

updated a model 2 days ago

Joseph717171/Imatrices

View all activity

Organizations

Joseph717171's activity

New activity in google/gemma-3-12b-it-qat-q4_0-gguf 2 days ago

Permission to access the Unquantized versions of the QAT weights for Gemma-3

#4 opened 4 days ago by

Joseph717171

liked a model 2 days ago

reducto/RolmOCR

Image-Text-to-Text • Updated 8 days ago • 3.15k • 298

updated a model 2 days ago

Joseph717171/Imatrices

Updated 2 days ago • 4

liked a model 2 days ago

agentica-org/DeepCoder-1.5B-Preview

Text Generation • Updated 1 day ago • 390 • 38

liked a dataset 2 days ago

agentica-org/DeepCoder-Preview-Dataset

Viewer • Updated 1 day ago • 25k • 805 • 42

liked 4 models 2 days ago

liked a model 3 days ago

HiDream-ai/HiDream-I1-Full

Text-to-Image • Updated 1 day ago • 1.65k • 203

liked a model 4 days ago

google/gemma-3-27b-it-qat-q4_0-gguf

Image-Text-to-Text • Updated about 5 hours ago • 44.6k • 154

reacted to merterbak's post with 🤗🔥 4 days ago

Post

2908

Meta has unveiled its Llama 4 🦙 family of models, featuring native multimodality and mixture-of-experts architecture. Two model families are available now:
Models🤗: meta-llama/llama-4-67f0c30d9fe03840bc9d0164
Blog Post: https://ai.meta.com/blog/llama-4-multimodal-intelligence/
HF's Blog Post: https://huggingface.co/blog/llama4-release

- 🧠 Native Multimodality - Process text and images in a unified architecture
- 🔍 Mixture-of-Experts - First Llama models using MoE for incredible efficiency
- 📏 Super Long Context - Up to 10M tokens
- 🌐 Multilingual Power - Trained on 200 languages with 10x more multilingual tokens than Llama 3 (including over 100 languages with over 1 billion tokens each)

🔹 Llama 4 Scout
- 17B active parameters (109B total)
- 16 experts architecture
- 10M context window
- Fits on a single H100 GPU
- Beats Gemma 3, Gemini 2.0 Flash-Lite, and Mistral 3.1

🔹 Llama 4 Maverick
- 17B active parameters (400B total)
- 128 experts architecture
- It can fit perfectly on DGX H100(8x H100)
- 1M context window
- Outperforms GPT-4o and Gemini 2.0 Flash
- ELO score of 1417 on LMArena currently second best model on arena

🔹 Llama 4 Behemoth (Coming Soon)
- 288B active parameters (2T total)
- 16 experts architecture
- Teacher model for Scout and Maverick
- Outperforms GPT-4.5, Claude Sonnet 3.7, and Gemini 2.0 Pro on STEM benchmarks