Quantization made by Richard Erkhov. [Github](https://github.com/RichardErkhov) [Discord](https://discord.gg/pvy7H8DZMG) [Request more models](https://github.com/RichardErkhov/quant_request) polka-1.1b-chat - bnb 8bits - Model creator: https://huggingface.co/eryk-mazus/ - Original model: https://huggingface.co/eryk-mazus/polka-1.1b-chat/ Original model description: --- tags: - generated_from_trainer - conversational - polish license: mit language: - pl datasets: - eryk-mazus/polka-dpo-v1 pipeline_tag: text-generation inference: false --- ![image/png](https://cdn-uploads.huggingface.co/production/uploads/61bf0e11c88f3fd22f654059/FiMCITBAaEyMyxCHhfWVD.png) # Polka-1.1B-Chat `eryk-mazus/polka-1.1b-chat` **is the first polish model trained to act as a helpful, conversational assistant that can be run locally.** The model is based on [TinyLlama-1.1B](https://huggingface.co/TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T) with the custom, extended tokenizer for more efficient Polish text generation, that was additionally pretrained on 5.7 billion tokens. **It was then fine-tuned on around 60k synthetically generated and machine-translated multi-turn conversations with the [Direct Preference Optimization (DPO)](https://arxiv.org/abs/2305.18290) performed on top of it.** Context size: 4,096 tokens In addition, we're releasing: * [polka-1.1b](https://huggingface.co/eryk-mazus/polka-1.1b) - our base model with an extended tokenizer and additional pre-training on Polish corpus sampled using [DSIR](https://github.com/p-lambda/dsir) * [polka-pretrain-en-pl-v1](https://huggingface.co/datasets/eryk-mazus/polka-pretrain-en-pl-v1) - the pre-training dataset * [polka-dpo-v1](https://huggingface.co/datasets/eryk-mazus/polka-dpo-v1) - dataset of DPO pairs * [polka-1.1b-chat-gguf](https://huggingface.co/eryk-mazus/polka-1.1b-chat-gguf) - GGUF files for the chat model ## Usage Sample code: ```python import torch from transformers import AutoTokenizer, AutoModelForCausalLM, TextStreamer model_name = "eryk-mazus/polka-1.1b-chat" tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=True) tokenizer.pad_token = tokenizer.eos_token model = AutoModelForCausalLM.from_pretrained( model_name, torch_dtype=torch.bfloat16 if torch.cuda.is_bf16_supported() else torch.float16, device_map="auto" ) streamer = TextStreamer(tokenizer, skip_prompt=True) # You are a helpful assistant. system_prompt = "Jesteś pomocnym asystentem." chat = [{"role": "system", "content": system_prompt}] # Compose a short song on programming. user_input = "Napisz krótką piosenkę o programowaniu." chat.append({"role": "user", "content": user_input}) # Generate - add_generation_prompt to make sure it continues as assistant inputs = tokenizer.apply_chat_template(chat, add_generation_prompt=True, return_tensors="pt") # For multi-GPU, find the device of the first parameter of the model first_param_device = next(model.parameters()).device inputs = inputs.to(first_param_device) with torch.no_grad(): outputs = model.generate( inputs, pad_token_id=tokenizer.eos_token_id, max_new_tokens=512, temperature=0.2, repetition_penalty=1.15, top_p=0.95, do_sample=True, streamer=streamer, ) # Add just the new tokens to our chat new_tokens = outputs[0, inputs.size(1):] response = tokenizer.decode(new_tokens, skip_special_tokens=True) chat.append({"role": "assistant", "content": response}) ``` The model works seamlessly with [vLLM](https://github.com/vllm-project/vllm) as well. ## Prompt format This model uses ChatML as the prompt format: ``` <|im_start|>system Jesteś pomocnym asystentem. <|im_start|>user Jakie jest dzienne zapotrzebowanie kaloryczne dorosłej osoby?<|im_end|> <|im_start|>assistant Dla dorosłych osób zaleca się spożywanie około 2000-3000 kcal dziennie, aby utrzymać optymalne zdrowie i dobre samopoczucie.<|im_end|> ``` This prompt is available as a [chat template](https://huggingface.co/docs/transformers/chat_templating), which means you can format messages using the `tokenizer.apply_chat_template()` method, as demonstrated in the example above.