Spaces:
Sleeping
Sleeping
metadata
title: Auto Gguf Quant
emoji: 🐠
colorFrom: gray
colorTo: red
sdk: gradio
sdk_version: 5.20.1
app_file: app.py
pinned: false
short_description: Automatically quantizes Sculptor models
Ursa Minor Quantization Monitor
This Space automatically generates quantized versions of the Sculptor-AI/Ursa_Minor model and uploads them to the Sculptor-AI/Ursa_Minor_Quantized repository.
Features
- Monitors the source repository for updates
- Automatically generates quantized versions when the source model is updated
- Displays a progress bar during quantization
- Shows an "up to date" indicator when all quantizations are complete
- Handles out-of-memory errors gracefully
Quantization Types
The following quantizations are generated in order from smallest to largest:
Type | Size (GB) | Notes |
---|---|---|
GGUF Q2_K | 0.8 | |
GGUF Q3_K_S | 0.9 | |
GGUF Q3_K_M | 0.9 | lower quality |
GGUF Q3_K_L | 1.0 | |
GGUF IQ4_XS | 1.0 | |
GGUF Q4_K_S | 1.0 | fast, recommended |
GGUF Q4_K_M | 1.1 | fast, recommended |
GGUF Q5_K_S | 1.2 | |
GGUF Q5_K_M | 1.2 | |
GGUF Q6_K | 1.4 | very good quality |
GGUF Q8_0 | 1.7 | fast, best quality |
GGUF f16 | 3.2 | 16 bpw, overkill |
Setup
To run this Space, you need to set an HF_TOKEN
environment variable with write access to the destination repository.
Note About Free Compute Tier
The Hugging Face free compute tier has limited memory. This Space is designed to handle out-of-memory errors gracefully, but larger quantizations may fail due to memory constraints. If you need to generate larger quantizations, consider upgrading to a paid compute tier.