---
inference: true
library_name: transformers
tags:
- fluently-lm
- fluently
- prinum
- instruct
- trained
- math
- roleplay
- reasoning
- axolotl
- unsloth
- argilla
- qwen2
license: mit
language:
- en
- fr
- es
- ru
- zh
- ja
- fa
- code
datasets:
- fluently-sets/ultraset
- fluently-sets/ultrathink
- fluently-sets/reasoning-1-1k
- fluently-sets/MATH-500-Overall
pipeline_tag: text-generation
---

# **FluentlyLM Prinum** (32B-version)

Introducing the first standalone model from Project Fluently LM! We worked on it for several months, used different approaches, and eventually found the optimal one.

## Model Details

### Model Description

- **Developed by:** [@fluently-lm](https://hf.co/fluently-lm)
- **Model type:** Causal Language Models (QwenForCausalLM, LM Transformer)
- **Number of Parameters:** 32.5B
- **Number of Paramaters (Non-Embedding):** 31.0B
- **Number of Layers:** 64
- **Number of Attention Heads (GQA):** 40 for Q and 8 for KV
- **Context Length:** Full 131,072 tokens
- **Language(s) (NLP):** English, French, Spanish, Russian, Chinese, Japanese, Persian *(official support)*
- **License:** MIT

### Quickstart

Here provides a code snippet with apply_chat_template to show you how to load the tokenizer and model and how to generate contents.

```py
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "fluently-lm/FluentlyLM-Prinum"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

prompt = "Write a quick sort algorithm."
messages = [
    {"role": "system", "content": "You are FluentlyLM, created by Project Fluently. You are a helpful assistant."},
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=1024
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
```

#### GGUF-using

You can also use our model locally via GGUF file in various interfaces and workflows, we offer several repos for downloading GGUF:
- [mradermacher/FluentlyLM-Prinum-GGUF](https://huggingface.co/mradermacher/FluentlyLM-Prinum-GGUF) (all GGUF-quants)
- [fluently-lm/FluentlyLM-Prinum-Q4_K_M-GGUF](https://huggingface.co/fluently-lm/FluentlyLM-Prinum-Q4_K_M-GGUF) (only Q4_K_M-quant) *(coming soon...)*

### Model recipe

![image/png](https://cdn-uploads.huggingface.co/production/uploads/65a3d8d58448f47df24c041a/QIkaMeP8FhcbJuvCH2GwF.png)

### Evolution

**🏆 12th place on [Open LLM Leaderboard](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#)**

![image/png](https://cdn-uploads.huggingface.co/production/uploads/65a3d8d58448f47df24c041a/kGPerdFRuwCkzJCzxC7dE.png)

## Special thanks

🤗 We are grateful for open source resources, technologies and assistance from: [Unsloth AI](https://unsloth.ai), [Axolotl AI](https://axolotl.ai), [Argilla](https://argilla.io), [Alibaba Cloud: Qwen](https://qwenlm.ai), [NVIDIA](https://huggingface.co/nvidia) and [NousResearch](https://nousresearch.com).