Ursa_Minor-GGUF

Official GGUF quantizations of Sculptor-AI/Ursa_Minor.

About

This repository contains the official quantized versions of Ursa_Minor, created by Sculptor-AI. These quantizations are optimized for both quality and performance.

Model Description

Ursa_Minor is a reasoning-focused language model developed by ExplodingCB2 & Kaileh57 (Sculptor-AI). It is designed to tackle complex reasoning tasks, demonstrating capabilities in multi-step inference, logical deduction, and contextual understanding.

Key Features:

  • Reasoning Prowess: Emphasizes strong reasoning abilities over sheer memorization
  • Multi-Step Inference: Breaks down complex problems into smaller, manageable steps
  • Logical Deduction: Applies logical rules and principles to arrive at valid conclusions
  • Contextual Understanding: Grasps and utilizes contextual information to enhance reasoning accuracy

Usage

If you are unsure how to use GGUF files, here are some common ways to run the model:

llama.cpp

./main -m /path/to/Ursa_Minor.Q4_K_M.gguf -n 512 -p "What are the prime factors of 42?"

Text-generation-webui

Load the model in text-generation-webui by selecting the GGUF file from your models directory.

LM Studio

Import the GGUF file directly in LM Studio to run the model locally.

Provided Quantizations

The following quantizations are available, sorted from smallest to largest file size:

Type Size Quality Inference Speed Notes
Q2_K 0.8 GB Basic Very Fast Smallest size, acceptable for basic tasks
Q3_K_S 0.9 GB Improved Fast Good balance for limited resources
Q3_K_M 0.9 GB Improved+ Fast Slightly better than Q3_K_S
Q3_K_L 1.0 GB Good Moderate Recommended for good quality with small size
IQ4_XS 1.0 GB Good+ Moderate Improved quantization technique
Q4_K_S 1.0 GB Very Good Fast Recommended for most users
Q4_K_M 1.1 GB Very Good+ Fast Recommended for daily use
Q5_K_S 1.2 GB Excellent Moderate High-quality output
Q5_K_M 1.2 GB Excellent+ Moderate Enhanced quality over Q5_K_S
Q6_K 1.4 GB Superior Moderate Very high-quality, close to F16
Q8_0 1.7 GB Near-perfect Slow Almost indistinguishable from F16
F16 3.2 GB Perfect Very Slow No quantization, full precision

Recommendations

  • For most users: Q4_K_M provides an excellent balance of quality and size
  • For limited resources: Q3_K_L or Q3_K_S offer good performance at smaller sizes
  • For best quality: Q6_K or Q8_0 provide near-original model quality

Quantization Quality Comparison

Here's a comparison of different quantization types (lower is better):

Quantization Comparison

For more detailed information about quantization techniques and their effects, see Artefact2's analysis.

Community and Support

If you have questions or need support with these quantized models, please open a discussion on our community page.

Acknowledgments

We thank the community for their support and feedback in helping us improve and optimize these model quantizations.

Downloads last month

-

Downloads are not tracked for this model. How to track
GGUF
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model authors have turned it off explicitly.

Model tree for Sculptor-AI/Ursa_Minor-GGUF

Finetuned
(1)
this model