RichardErkhov
commited on
Commit
•
8a23f4f
1
Parent(s):
86d1aec
uploaded readme
Browse files
README.md
ADDED
@@ -0,0 +1,116 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
Quantization made by Richard Erkhov.
|
2 |
+
|
3 |
+
[Github](https://github.com/RichardErkhov)
|
4 |
+
|
5 |
+
[Discord](https://discord.gg/pvy7H8DZMG)
|
6 |
+
|
7 |
+
[Request more models](https://github.com/RichardErkhov/quant_request)
|
8 |
+
|
9 |
+
|
10 |
+
polka-1.1b-chat - bnb 4bits
|
11 |
+
- Model creator: https://huggingface.co/eryk-mazus/
|
12 |
+
- Original model: https://huggingface.co/eryk-mazus/polka-1.1b-chat/
|
13 |
+
|
14 |
+
|
15 |
+
|
16 |
+
|
17 |
+
Original model description:
|
18 |
+
---
|
19 |
+
tags:
|
20 |
+
- generated_from_trainer
|
21 |
+
- conversational
|
22 |
+
- polish
|
23 |
+
license: mit
|
24 |
+
language:
|
25 |
+
- pl
|
26 |
+
datasets:
|
27 |
+
- eryk-mazus/polka-dpo-v1
|
28 |
+
pipeline_tag: text-generation
|
29 |
+
inference: false
|
30 |
+
---
|
31 |
+
|
32 |
+
![image/png](https://cdn-uploads.huggingface.co/production/uploads/61bf0e11c88f3fd22f654059/FiMCITBAaEyMyxCHhfWVD.png)
|
33 |
+
|
34 |
+
# Polka-1.1B-Chat
|
35 |
+
|
36 |
+
`eryk-mazus/polka-1.1b-chat` **is the first polish model trained to act as a helpful, conversational assistant that can be run locally.**
|
37 |
+
|
38 |
+
The model is based on [TinyLlama-1.1B](https://huggingface.co/TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T) with the custom, extended tokenizer for more efficient Polish text generation, that was additionally pretrained on 5.7 billion tokens. **It was then fine-tuned on around 60k synthetically generated and machine-translated multi-turn conversations with the [Direct Preference Optimization (DPO)](https://arxiv.org/abs/2305.18290) performed on top of it.**
|
39 |
+
|
40 |
+
Context size: 4,096 tokens
|
41 |
+
|
42 |
+
In addition, we're releasing:
|
43 |
+
* [polka-1.1b](https://huggingface.co/eryk-mazus/polka-1.1b) - our base model with an extended tokenizer and additional pre-training on Polish corpus sampled using [DSIR](https://github.com/p-lambda/dsir)
|
44 |
+
* [polka-pretrain-en-pl-v1](https://huggingface.co/datasets/eryk-mazus/polka-pretrain-en-pl-v1) - the pre-training dataset
|
45 |
+
* [polka-dpo-v1](https://huggingface.co/datasets/eryk-mazus/polka-dpo-v1) - dataset of DPO pairs
|
46 |
+
* [polka-1.1b-chat-gguf](https://huggingface.co/eryk-mazus/polka-1.1b-chat-gguf) - GGUF files for the chat model
|
47 |
+
|
48 |
+
## Usage
|
49 |
+
|
50 |
+
Sample code:
|
51 |
+
|
52 |
+
```python
|
53 |
+
import torch
|
54 |
+
from transformers import AutoTokenizer, AutoModelForCausalLM, TextStreamer
|
55 |
+
|
56 |
+
model_name = "eryk-mazus/polka-1.1b-chat"
|
57 |
+
|
58 |
+
tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=True)
|
59 |
+
tokenizer.pad_token = tokenizer.eos_token
|
60 |
+
|
61 |
+
model = AutoModelForCausalLM.from_pretrained(
|
62 |
+
model_name,
|
63 |
+
torch_dtype=torch.bfloat16 if torch.cuda.is_bf16_supported() else torch.float16,
|
64 |
+
device_map="auto"
|
65 |
+
)
|
66 |
+
streamer = TextStreamer(tokenizer, skip_prompt=True)
|
67 |
+
|
68 |
+
# You are a helpful assistant.
|
69 |
+
system_prompt = "Jesteś pomocnym asystentem."
|
70 |
+
chat = [{"role": "system", "content": system_prompt}]
|
71 |
+
|
72 |
+
# Compose a short song on programming.
|
73 |
+
user_input = "Napisz krótką piosenkę o programowaniu."
|
74 |
+
chat.append({"role": "user", "content": user_input})
|
75 |
+
|
76 |
+
# Generate - add_generation_prompt to make sure it continues as assistant
|
77 |
+
inputs = tokenizer.apply_chat_template(chat, add_generation_prompt=True, return_tensors="pt")
|
78 |
+
# For multi-GPU, find the device of the first parameter of the model
|
79 |
+
first_param_device = next(model.parameters()).device
|
80 |
+
inputs = inputs.to(first_param_device)
|
81 |
+
|
82 |
+
with torch.no_grad():
|
83 |
+
outputs = model.generate(
|
84 |
+
inputs,
|
85 |
+
pad_token_id=tokenizer.eos_token_id,
|
86 |
+
max_new_tokens=512,
|
87 |
+
temperature=0.2,
|
88 |
+
repetition_penalty=1.15,
|
89 |
+
top_p=0.95,
|
90 |
+
do_sample=True,
|
91 |
+
streamer=streamer,
|
92 |
+
)
|
93 |
+
|
94 |
+
# Add just the new tokens to our chat
|
95 |
+
new_tokens = outputs[0, inputs.size(1):]
|
96 |
+
response = tokenizer.decode(new_tokens, skip_special_tokens=True)
|
97 |
+
chat.append({"role": "assistant", "content": response})
|
98 |
+
```
|
99 |
+
|
100 |
+
The model works seamlessly with [vLLM](https://github.com/vllm-project/vllm) as well.
|
101 |
+
|
102 |
+
## Prompt format
|
103 |
+
|
104 |
+
This model uses ChatML as the prompt format:
|
105 |
+
```
|
106 |
+
<|im_start|>system
|
107 |
+
Jesteś pomocnym asystentem.
|
108 |
+
<|im_start|>user
|
109 |
+
Jakie jest dzienne zapotrzebowanie kaloryczne dorosłej osoby?<|im_end|>
|
110 |
+
<|im_start|>assistant
|
111 |
+
Dla dorosłych osób zaleca się spożywanie około 2000-3000 kcal dziennie, aby utrzymać optymalne zdrowie i dobre samopoczucie.<|im_end|>
|
112 |
+
```
|
113 |
+
|
114 |
+
This prompt is available as a [chat template](https://huggingface.co/docs/transformers/chat_templating), which means you can format messages using the `tokenizer.apply_chat_template()` method, as demonstrated in the example above.
|
115 |
+
|
116 |
+
|