--- inference: true library_name: transformers tags: - fluently-lm - fluently - prinum - instruct - trained - math - roleplay - reasoning - axolotl - unsloth - argilla - qwen2 license: mit language: - en - fr - es - ru - zh - ja - fa - code datasets: - fluently-sets/ultraset - fluently-sets/ultrathink - fluently-sets/reasoning-1-1k - fluently-sets/MATH-500-Overall pipeline_tag: text-generation --- # **FluentlyLM Prinum** (32B-version) Introducing the first standalone model from Project Fluently LM! We worked on it for several months, used different approaches, and eventually found the optimal one. ## Model Details ### Model Description - **Developed by:** [@fluently-lm](https://hf.co/fluently-lm) - **Model type:** Causal Language Models (QwenForCausalLM, LM Transformer) - **Number of Parameters:** 32.5B - **Number of Paramaters (Non-Embedding):** 31.0B - **Number of Layers:** 64 - **Number of Attention Heads (GQA):** 40 for Q and 8 for KV - **Context Length:** Full 131,072 tokens - **Language(s) (NLP):** English, French, Spanish, Russian, Chinese, Japanese, Persian *(official support)* - **License:** MIT ### Quickstart Here provides a code snippet with apply_chat_template to show you how to load the tokenizer and model and how to generate contents. ```py from transformers import AutoModelForCausalLM, AutoTokenizer model_name = "fluently-lm/FluentlyLM-Prinum" model = AutoModelForCausalLM.from_pretrained( model_name, torch_dtype="auto", device_map="auto" ) tokenizer = AutoTokenizer.from_pretrained(model_name) prompt = "Write a quick sort algorithm." messages = [ {"role": "system", "content": "You are FluentlyLM, created by Project Fluently. You are a helpful assistant."}, {"role": "user", "content": prompt} ] text = tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True ) model_inputs = tokenizer([text], return_tensors="pt").to(model.device) generated_ids = model.generate( **model_inputs, max_new_tokens=1024 ) generated_ids = [ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids) ] response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0] ``` #### GGUF-using You can also use our model locally via GGUF file in various interfaces and workflows, we offer several repos for downloading GGUF: - [mradermacher/FluentlyLM-Prinum-GGUF](https://huggingface.co/mradermacher/FluentlyLM-Prinum-GGUF) (all GGUF-quants) - [fluently-lm/FluentlyLM-Prinum-Q4_K_M-GGUF](https://huggingface.co/fluently-lm/FluentlyLM-Prinum-Q4_K_M-GGUF) (only Q4_K_M-quant) *(coming soon...)* ### Model recipe ![image/png](https://cdn-uploads.huggingface.co/production/uploads/65a3d8d58448f47df24c041a/QIkaMeP8FhcbJuvCH2GwF.png) ### Evolution **🏆 12th place on [Open LLM Leaderboard](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#)** ![image/png](https://cdn-uploads.huggingface.co/production/uploads/65a3d8d58448f47df24c041a/kGPerdFRuwCkzJCzxC7dE.png) ## Special thanks 🤗 We are grateful for open source resources, technologies and assistance from: [Unsloth AI](https://unsloth.ai), [Axolotl AI](https://axolotl.ai), [Argilla](https://argilla.io), [Alibaba Cloud: Qwen](https://qwenlm.ai), [NVIDIA](https://huggingface.co/nvidia) and [NousResearch](https://nousresearch.com).