metadata

inference: true
library_name: transformers
tags:
  - fluently-lm
  - fluently
  - prinum
  - instruct
  - trained
  - math
  - roleplay
  - reasoning
  - axolotl
  - unsloth
  - argilla
  - qwen2
license: mit
language:
  - en
  - fr
  - es
  - ru
  - zh
  - ja
  - fa
  - code
datasets:
  - fluently-sets/ultraset
  - fluently-sets/ultrathink
  - fluently-sets/reasoning-1-1k
  - fluently-sets/MATH-500-Overall
pipeline_tag: text-generation

FluentlyLM Prinum (32B-version)

Introducing the first standalone model from Project Fluently LM! We worked on it for several months, used different approaches, and eventually found the optimal one.

Model Details

Model Description

Developed by: @fluently-lm
Model type: Causal Language Models (QwenForCausalLM, LM Transformer)
Number of Parameters: 32.5B
Number of Paramaters (Non-Embedding): 31.0B
Number of Layers: 64
Number of Attention Heads (GQA): 40 for Q and 8 for KV
Context Length: Full 131,072 tokens
Language(s) (NLP): English, French, Spanish, Russian, Chinese, Japanese, Persian (official support)
License: MIT

Quickstart

Here provides a code snippet with apply_chat_template to show you how to load the tokenizer and model and how to generate contents.

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "fluently-lm/FluentlyLM-Prinum"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

prompt = "Write a quick sort algorithm."
messages = [
    {"role": "system", "content": "You are FluentlyLM, created by Project Fluently. You are a helpful assistant."},
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=1024
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

GGUF-using

You can also use our model locally via GGUF file in various interfaces and workflows, we offer several repos for downloading GGUF:

mradermacher/FluentlyLM-Prinum-GGUF (all GGUF-quants)
fluently-lm/FluentlyLM-Prinum-Q4_K_M-GGUF (only Q4_K_M-quant) (coming soon...)

Model recipe

Evolution

🏆 12th place on Open LLM Leaderboard

Special thanks

🤗 We are grateful for open source resources, technologies and assistance from: Unsloth AI, Axolotl AI, Argilla, Alibaba Cloud: Qwen, NVIDIA and NousResearch.