Post
63
โก FlashHead benchmarks for Llama 3.2, Gemma 3, and Qwen3 are now on embedl/Edge-Inference-Benchmarks !
These are some of the models used in the FlashHead paper - now easier to explore and compare interactively.
๐ Jetson AGX Thor (tok/s, batch=1):
- Llama-3.2-1B: 77 โ 285 (FlashHead+W4A16, 3.7x)
- Llama-3.2-3B: 34 โ 112 (3.3x)
- Gemma-3-1B: 79 โ 153 (1.9x)
- Qwen3-1.7B: 49 โ 189 (3.8x)
- Qwen3-0.6B: 140 โ 177 (1.3x)
โ Accuracy matches baseline on MMLU-Pro, IFEval, BBH, TruthfulQA, GSM8K.
These are some of the models used in the FlashHead paper - now easier to explore and compare interactively.
๐ Jetson AGX Thor (tok/s, batch=1):
- Llama-3.2-1B: 77 โ 285 (FlashHead+W4A16, 3.7x)
- Llama-3.2-3B: 34 โ 112 (3.3x)
- Gemma-3-1B: 79 โ 153 (1.9x)
- Qwen3-1.7B: 49 โ 189 (3.8x)
- Qwen3-0.6B: 140 โ 177 (1.3x)
โ Accuracy matches baseline on MMLU-Pro, IFEval, BBH, TruthfulQA, GSM8K.