Ed Addario PRO
eaddario
AI & ML interests
Finding ways to optimize LLMs' inference performance in resource-constrained environments (e.g. commodity hardware, desktops, laptops, mobiles, edge devices, etc.)
Recent Activity
posted
an
update
2 days ago
Experimental global target bits‑per‑weight quantization of mistralai/Ministral-3-14B-Instruct-2512 and mistralai/Ministral-3-14B-Reasoning-2512
Unlike standard llama.cpp quantizations that rely on fixed type heuristics (e.g., Q4_K_M), the Target BPW approach optimizes per-tensor precision where it matters the most, and produces high quality models that meet a precise global file size target.
Key Advantages:
- VRAM Maximization: Can generate high quality models sized exactly to fit hardware constraints (e.g., fitting the model into exactly 24GB VRAM).
- Data-Driven Precision: Quantization mix is determined by actual weight error sensitivity rather than hardcoded rules, often yielding better PPL/KLD size trade-offs.
Full benchmarks (PPL, KLD, ARC, MMLU, etc.) and methodology in the models' cards
https://huggingface.co/eaddario/Ministral-3-14B-Instruct-2512-GGUF
https://huggingface.co/eaddario/Ministral-3-14B-Reasoning-2512-GGUF
updated
a model
2 days ago
eaddario/Ministral-3-14B-Instruct-2512-GGUF
updated
a model
2 days ago
eaddario/Ministral-3-14B-Reasoning-2512-GGUF