Building on HF

9 5 8

Ed Addario PRO

eaddario

EAddario

AI & ML interests

Finding ways to optimize LLMs' inference performance in resource-constrained environments (e.g. commodity hardware, desktops, laptops, mobiles, edge devices, etc.)

Recent Activity

posted an update 10 days ago

Experimental global target bits‑per‑weight quantization of mistralai/Ministral-3-14B-Instruct-2512 and mistralai/Ministral-3-14B-Reasoning-2512 Unlike standard llama.cpp quantizations that rely on fixed type heuristics (e.g., Q4_K_M), the Target BPW approach optimizes per-tensor precision where it matters the most, and produces high quality models that meet a precise global file size target. Key Advantages: - VRAM Maximization: Can generate high quality models sized exactly to fit hardware constraints (e.g., fitting the model into exactly 24GB VRAM). - Data-Driven Precision: Quantization mix is determined by actual weight error sensitivity rather than hardcoded rules, often yielding better PPL/KLD size trade-offs. Full benchmarks (PPL, KLD, ARC, MMLU, etc.) and methodology in the models' cards https://huggingface.co/eaddario/Ministral-3-14B-Instruct-2512-GGUF https://huggingface.co/eaddario/Ministral-3-14B-Reasoning-2512-GGUF

updated a model 10 days ago

eaddario/Ministral-3-14B-Instruct-2512-GGUF

updated a model 10 days ago

eaddario/Ministral-3-14B-Reasoning-2512-GGUF

View all activity

Organizations

upvoted an article 2 months ago

Article

Norm-Preserving Biprojected Abliteration

Nov 6, 2025

•

upvoted an article 10 months ago

Article

Cohere on Hugging Face Inference Providers 🔥

Apr 16, 2025

•

129

upvoted an article 11 months ago

Article

Making LLMs Smaller Without Breaking Them: A GLU-Aware Pruning Approach

Nov 24, 2024

•

upvoted a collection 12 months ago

Dolphin 3.0

Collection

Dolphin 3.0 is the next generation of the Dolphin series of instruct-tuned models. Designed to be the ultimate general purpose local model. • 9 items • Updated Feb 7, 2025 • 197

upvoted a collection about 1 year ago

Unsloth 4-bit Dynamic Quants

Collection

Unsloths Dynamic 4bit Quants selectively skips quantizing certain parameters; greatly improving accuracy while only using <10% more VRAM than BnB 4bit • 28 items • Updated 7 days ago • 94

Ed Addario PRO

AI & ML interests

Recent Activity

Organizations

eaddario's activity

Norm-Preserving Biprojected Abliteration

Cohere on Hugging Face Inference Providers 🔥

Making LLMs Smaller Without Breaking Them: A GLU-Aware Pruning Approach