auto-gguf-quant / README.md
Kaileh57's picture
fix
8df3b1d
|
raw
history blame
1.77 kB
metadata
title: Auto Gguf Quant
emoji: 🐠
colorFrom: gray
colorTo: red
sdk: gradio
sdk_version: 5.20.1
app_file: app.py
pinned: false
short_description: Automatically quantizes Sculptor models

Ursa Minor Quantization Monitor

This Space automatically generates quantized versions of the Sculptor-AI/Ursa_Minor model and uploads them to the Sculptor-AI/Ursa_Minor_Quantized repository.

Features

  • Monitors the source repository for updates
  • Automatically generates quantized versions when the source model is updated
  • Displays a progress bar during quantization
  • Shows an "up to date" indicator when all quantizations are complete
  • Handles out-of-memory errors gracefully

Quantization Types

The following quantizations are generated in order from smallest to largest:

Type Size (GB) Notes
GGUF Q2_K 0.8
GGUF Q3_K_S 0.9
GGUF Q3_K_M 0.9 lower quality
GGUF Q3_K_L 1.0
GGUF IQ4_XS 1.0
GGUF Q4_K_S 1.0 fast, recommended
GGUF Q4_K_M 1.1 fast, recommended
GGUF Q5_K_S 1.2
GGUF Q5_K_M 1.2
GGUF Q6_K 1.4 very good quality
GGUF Q8_0 1.7 fast, best quality
GGUF f16 3.2 16 bpw, overkill

Setup

To run this Space, you need to set an HF_TOKEN environment variable with write access to the destination repository.

Note About Free Compute Tier

The Hugging Face free compute tier has limited memory. This Space is designed to handle out-of-memory errors gracefully, but larger quantizations may fail due to memory constraints. If you need to generate larger quantizations, consider upgrading to a paid compute tier.