|
--- |
|
language: |
|
- en |
|
license: other |
|
library_name: transformers |
|
tags: |
|
- chat |
|
- qwen |
|
- qwen2.5 |
|
- finetune |
|
- english |
|
base_model: |
|
- MaziyarPanahi/calme-3.2-instruct-78b |
|
model_name: calme-3.2-instruct-78b |
|
license_name: qwen |
|
license_link: https://huggingface.co/Qwen/Qwen2.5-72B-Instruct/blob/main/LICENSE |
|
pipeline_tag: text-generation |
|
inference: false |
|
model_creator: MaziyarPanahi |
|
quantized_by: MaziyarPanahi |
|
model-index: |
|
- name: calme-3.2-instruct-78b |
|
results: |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: IFEval (0-Shot) |
|
type: HuggingFaceH4/ifeval |
|
args: |
|
num_few_shot: 0 |
|
metrics: |
|
- type: inst_level_strict_acc and prompt_level_strict_acc |
|
value: 80.63 |
|
name: strict accuracy |
|
source: |
|
url: >- |
|
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=MaziyarPanahi/calme-3.2-instruct-78b |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: BBH (3-Shot) |
|
type: BBH |
|
args: |
|
num_few_shot: 3 |
|
metrics: |
|
- type: acc_norm |
|
value: 62.61 |
|
name: normalized accuracy |
|
source: |
|
url: >- |
|
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=MaziyarPanahi/calme-3.2-instruct-78b |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: MATH Lvl 5 (4-Shot) |
|
type: hendrycks/competition_math |
|
args: |
|
num_few_shot: 4 |
|
metrics: |
|
- type: exact_match |
|
value: 39.95 |
|
name: exact match |
|
source: |
|
url: >- |
|
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=MaziyarPanahi/calme-3.2-instruct-78b |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: GPQA (0-shot) |
|
type: Idavidrein/gpqa |
|
args: |
|
num_few_shot: 0 |
|
metrics: |
|
- type: acc_norm |
|
value: 20.36 |
|
name: acc_norm |
|
source: |
|
url: >- |
|
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=MaziyarPanahi/calme-3.2-instruct-78b |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: MuSR (0-shot) |
|
type: TAUR-Lab/MuSR |
|
args: |
|
num_few_shot: 0 |
|
metrics: |
|
- type: acc_norm |
|
value: 38.53 |
|
name: acc_norm |
|
source: |
|
url: >- |
|
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=MaziyarPanahi/calme-3.2-instruct-78b |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: MMLU-PRO (5-shot) |
|
type: TIGER-Lab/MMLU-Pro |
|
config: main |
|
split: test |
|
args: |
|
num_few_shot: 5 |
|
metrics: |
|
- type: acc |
|
value: 70.03 |
|
name: accuracy |
|
source: |
|
url: >- |
|
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=MaziyarPanahi/calme-3.2-instruct-78b |
|
name: Open LLM Leaderboard |
|
--- |
|
|
|
# EXL2 4.5bpw Quantization of calme-3.2-instruct-78b |
|
|
|
<img src="./calme_3.png" alt="Calme-3 Models" width="200" style="margin-left:'auto' margin-right:'auto' display:'block'"/> |
|
|
|
This repository hosts the **4.5 bits per weight (bpw)** quantization of the [calme-3.2-instruct-78b](https://huggingface.co/MaziyarPanahi/calme-3.2-instruct-78b) model, leveraging the **ExLlamaV2** format for efficient inference with high-context capabilities. This model is a Qwen 2.5 finetune. |
|
|
|
## Quantization Details |
|
- **Format:** ExLlamaV2 4.5bpw |
|
- **Version:** ExLlamaV2 0.2.6 |
|
- **Model Size:** 78 billion parameters |
|
- **VRAM Usage:** Approx. **44GB** (32,000 context) |
|
- **Calibration:** |
|
- Rows: 115 |
|
- Length: 2048 |
|
- Dataset: (default) |
|
|
|
The quantization process reduces memory usage and inference latency while maintaining high performance for generative text tasks. |
|
|
|
## Prompt Template |
|
This model uses the ChatML prompt template for interaction: |
|
|
|
``` |
|
<|im_start|>system |
|
{System} |
|
<|im_end|> |
|
<|im_start|>user |
|
{User} |
|
<|im_end|> |
|
<|im_start|>assistant |
|
{Assistant} |
|
``` |
|
|
|
## Model Usage |
|
|
|
### Example: Inference with ExLlamaV2 |
|
To use this quantized model, ensure you have the **ExLlamaV2** library installed: |
|
|
|
```bash |
|
pip install exllamav2 |
|
``` |
|
|
|
```python |
|
from exllamav2 import ExLlamaModel, ExLlamaTokenizer, ExLlamaPipeline |
|
|
|
# Load model and tokenizer |
|
model = ExLlamaModel.from_pretrained("DavidCatalano/calme-3.2-instruct-78b-exl2-4.5bpw") |
|
tokenizer = ExLlamaTokenizer.from_pretrained("DavidCatalano/calme-3.2-instruct-78b-exl2-4.5bpw") |
|
|
|
# Create pipeline |
|
pipeline = ExLlamaPipeline(model, tokenizer) |
|
|
|
# Generate text |
|
messages = [{"role": "user", "content": "What is EXL2 quantization?"}] |
|
response = pipeline(messages) |
|
print(response) |
|
``` |
|
|
|
## Features |
|
- EXL2 format requires Nvidia hardware but runs faster and with less RAM than GGUF. |
|
- Supports **44GB VRAM** with **32,000 context window**. |
|
- **40GB** minimum **1024 context window** |
|
- Highly optimized for inference, making it ideal for resource-constrained environments. |
|
- Compatible with ChatML-based prompting systems. |
|
|
|
## Acknowledgments |
|
- **Original Model Creator:** [MaziyarPanahi](https://huggingface.co/MaziyarPanahi) |
|
- **Quantization by:** [DavidCatalano](https://huggingface.co/DavidCatalano) |
|
- **Quantization Tool:** ExLlamaV2 0.2.6 |
|
|
|
## Download Instructions |
|
To download the model files: |
|
|
|
```bash |
|
huggingface-cli install huggingface_hub |
|
huggingface-cli login |
|
huggingface-cli download DavidCatalano/calme-3.2-instruct-78b-exl2-4.5bpw --include "*" --local-dir ./local-folder |
|
``` |
|
|
|
|
|
--- |