uploaded readme

a6d6b1d verified 8 days ago

4.19 kB

	Quantization made by Richard Erkhov.

	[Github](https://github.com/RichardErkhov)

	[Discord](https://discord.gg/pvy7H8DZMG)

	[Request more models](https://github.com/RichardErkhov/quant_request)


	polka-1.1b-chat - bnb 8bits
	- Model creator: https://huggingface.co/eryk-mazus/
	- Original model: https://huggingface.co/eryk-mazus/polka-1.1b-chat/




	Original model description:
	---
	tags:
	- generated_from_trainer
	- conversational
	- polish
	license: mit
	language:
	- pl
	datasets:
	- eryk-mazus/polka-dpo-v1
	pipeline_tag: text-generation
	inference: false
	---

	![image/png](https://cdn-uploads.huggingface.co/production/uploads/61bf0e11c88f3fd22f654059/FiMCITBAaEyMyxCHhfWVD.png)

	# Polka-1.1B-Chat

	`eryk-mazus/polka-1.1b-chat` is the first polish model trained to act as a helpful, conversational assistant that can be run locally.

	The model is based on [TinyLlama-1.1B](https://huggingface.co/TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T) with the custom, extended tokenizer for more efficient Polish text generation, that was additionally pretrained on 5.7 billion tokens. It was then fine-tuned on around 60k synthetically generated and machine-translated multi-turn conversations with the [Direct Preference Optimization (DPO)](https://arxiv.org/abs/2305.18290) performed on top of it.

	Context size: 4,096 tokens

	In addition, we're releasing:
	* [polka-1.1b](https://huggingface.co/eryk-mazus/polka-1.1b) - our base model with an extended tokenizer and additional pre-training on Polish corpus sampled using [DSIR](https://github.com/p-lambda/dsir)
	* [polka-pretrain-en-pl-v1](https://huggingface.co/datasets/eryk-mazus/polka-pretrain-en-pl-v1) - the pre-training dataset
	* [polka-dpo-v1](https://huggingface.co/datasets/eryk-mazus/polka-dpo-v1) - dataset of DPO pairs
	* [polka-1.1b-chat-gguf](https://huggingface.co/eryk-mazus/polka-1.1b-chat-gguf) - GGUF files for the chat model

	## Usage

	Sample code:

	```python
	import torch
	from transformers import AutoTokenizer, AutoModelForCausalLM, TextStreamer

	model_name = "eryk-mazus/polka-1.1b-chat"

	tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=True)
	tokenizer.pad_token = tokenizer.eos_token

	model = AutoModelForCausalLM.from_pretrained(
	model_name,
	torch_dtype=torch.bfloat16 if torch.cuda.is_bf16_supported() else torch.float16,
	device_map="auto"
	)
	streamer = TextStreamer(tokenizer, skip_prompt=True)

	# You are a helpful assistant.
	system_prompt = "Jesteś pomocnym asystentem."
	chat = [{"role": "system", "content": system_prompt}]

	# Compose a short song on programming.
	user_input = "Napisz krótką piosenkę o programowaniu."
	chat.append({"role": "user", "content": user_input})

	# Generate - add_generation_prompt to make sure it continues as assistant
	inputs = tokenizer.apply_chat_template(chat, add_generation_prompt=True, return_tensors="pt")
	# For multi-GPU, find the device of the first parameter of the model
	first_param_device = next(model.parameters()).device
	inputs = inputs.to(first_param_device)

	with torch.no_grad():
	outputs = model.generate(
	inputs,
	pad_token_id=tokenizer.eos_token_id,
	max_new_tokens=512,
	temperature=0.2,
	repetition_penalty=1.15,
	top_p=0.95,
	do_sample=True,
	streamer=streamer,
	)

	# Add just the new tokens to our chat
	new_tokens = outputs[0, inputs.size(1):]
	response = tokenizer.decode(new_tokens, skip_special_tokens=True)
	chat.append({"role": "assistant", "content": response})
	```

	The model works seamlessly with [vLLM](https://github.com/vllm-project/vllm) as well.

	## Prompt format

	This model uses ChatML as the prompt format:
	```
	<\|im_start\|>system
	Jesteś pomocnym asystentem.
	<\|im_start\|>user
	Jakie jest dzienne zapotrzebowanie kaloryczne dorosłej osoby?<\|im_end\|>
	<\|im_start\|>assistant
	Dla dorosłych osób zaleca się spożywanie około 2000-3000 kcal dziennie, aby utrzymać optymalne zdrowie i dobre samopoczucie.<\|im_end\|>
	```

	This prompt is available as a [chat template](https://huggingface.co/docs/transformers/chat_templating), which means you can format messages using the `tokenizer.apply_chat_template()` method, as demonstrated in the example above.