11 4 12

Jonna Matthiesen

JonnaMat

AI & ML interests

None yet

Recent Activity

liked a model about 19 hours ago

embedl/Cosmos-Reason2-2B-W4A16-Edge2-FlashHead

posted an update 1 day ago

⚡ FlashHead benchmarks for Llama 3.2, Gemma 3, and Qwen3 are now on https://huggingface.co/spaces/embedl/Edge-Inference-Benchmarks ! These are some of the models used in the FlashHead paper - now easier to explore and compare interactively. 🚀 Jetson AGX Thor (tok/s, batch=1): - Llama-3.2-1B: 77 → 285 (FlashHead+W4A16, 3.7x) - Llama-3.2-3B: 34 → 112 (3.3x) - Gemma-3-1B: 79 → 153 (1.9x) - Qwen3-1.7B: 49 → 189 (3.8x) - Qwen3-0.6B: 140 → 177 (1.3x) ✅ Accuracy matches baseline on MMLU-Pro, IFEval, BBH, TruthfulQA, GSM8K.

updated a model 1 day ago

embedl/Qwen3-1.7B-FlashHead-W4A16

View all activity

Organizations

Posts 9

Post

⚡ FlashHead benchmarks for Llama 3.2, Gemma 3, and Qwen3 are now on embedl/Edge-Inference-Benchmarks !
These are some of the models used in the FlashHead paper - now easier to explore and compare interactively.

🚀 Jetson AGX Thor (tok/s, batch=1):
- Llama-3.2-1B: 77 → 285 (FlashHead+W4A16, 3.7x)
- Llama-3.2-3B: 34 → 112 (3.3x)
- Gemma-3-1B: 79 → 153 (1.9x)
- Qwen3-1.7B: 49 → 189 (3.8x)
- Qwen3-0.6B: 140 → 177 (1.3x)

✅ Accuracy matches baseline on MMLU-Pro, IFEval, BBH, TruthfulQA, GSM8K.

View all Posts

Articles 3

Article

How to Build a vLLM Plugin: A Guide to the general_plugins Entry Point

View all Articles

models 0

None public yet

datasets 0

None public yet