Downtown-Case
/

Star-Command-R-Lite-32B-v1-exl2-4bpw

Text Generation

text-generation-inference

Inference Endpoints

4-bit precision

Model card Files Files and versions Community

Star-Command-R-Lite-32B-v1

A simple SLERP merge of TheDrummer's Star-Command with its base model, to tone it down and "keep" more of Command-R.

4bpw exl2 quantization made for use in exllama on 24GB GPUs.

https://huggingface.co/TheDrummer/Star-Command-R-32B-v1

Created using mergekit.

Merge Details

Merge Method

This model was merged using the SLERP merge method.

Models Merged

The following models were included in the merge:

TheDrummer/Star-Command-R-32B-v1
CohereForAI/c4ai-command-r-08-2024

Configuration

The following YAML configuration was used to produce this model:

models:
  - model: TheDrummer/Star-Command-R-32B-v1
  - model: CohereForAI/c4ai-command-r-08-2024
merge_method: slerp
parameters:
  t:
    - value: 0.5
base_model: CohereForAI/c4ai-command-r-08-2024
dtype: bfloat16

Downloads last month: 20

Inference Examples

Text Generation

This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for Downtown-Case/Star-Command-R-Lite-32B-v1-exl2-4bpw

CohereForAI/c4ai-command-r-08-2024

Downtown-Case/Star-Command-R-Lite-32B-v1

TheDrummer/Star-Command-R-32B-v1

Merge model

this model

Collection including Downtown-Case/Star-Command-R-Lite-32B-v1-exl2-4bpw

24GB VRAM Optimal Quants

When asked what I use locally on a 24GB card, this is what I point to. I favor exl2s for long context, GGUF for very short context. • 12 items • Updated Oct 31 • 3