RichardErkhov commited on
Commit
8a23f4f
1 Parent(s): 86d1aec

uploaded readme

Browse files
Files changed (1) hide show
  1. README.md +116 -0
README.md ADDED
@@ -0,0 +1,116 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Quantization made by Richard Erkhov.
2
+
3
+ [Github](https://github.com/RichardErkhov)
4
+
5
+ [Discord](https://discord.gg/pvy7H8DZMG)
6
+
7
+ [Request more models](https://github.com/RichardErkhov/quant_request)
8
+
9
+
10
+ polka-1.1b-chat - bnb 4bits
11
+ - Model creator: https://huggingface.co/eryk-mazus/
12
+ - Original model: https://huggingface.co/eryk-mazus/polka-1.1b-chat/
13
+
14
+
15
+
16
+
17
+ Original model description:
18
+ ---
19
+ tags:
20
+ - generated_from_trainer
21
+ - conversational
22
+ - polish
23
+ license: mit
24
+ language:
25
+ - pl
26
+ datasets:
27
+ - eryk-mazus/polka-dpo-v1
28
+ pipeline_tag: text-generation
29
+ inference: false
30
+ ---
31
+
32
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/61bf0e11c88f3fd22f654059/FiMCITBAaEyMyxCHhfWVD.png)
33
+
34
+ # Polka-1.1B-Chat
35
+
36
+ `eryk-mazus/polka-1.1b-chat` **is the first polish model trained to act as a helpful, conversational assistant that can be run locally.**
37
+
38
+ The model is based on [TinyLlama-1.1B](https://huggingface.co/TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T) with the custom, extended tokenizer for more efficient Polish text generation, that was additionally pretrained on 5.7 billion tokens. **It was then fine-tuned on around 60k synthetically generated and machine-translated multi-turn conversations with the [Direct Preference Optimization (DPO)](https://arxiv.org/abs/2305.18290) performed on top of it.**
39
+
40
+ Context size: 4,096 tokens
41
+
42
+ In addition, we're releasing:
43
+ * [polka-1.1b](https://huggingface.co/eryk-mazus/polka-1.1b) - our base model with an extended tokenizer and additional pre-training on Polish corpus sampled using [DSIR](https://github.com/p-lambda/dsir)
44
+ * [polka-pretrain-en-pl-v1](https://huggingface.co/datasets/eryk-mazus/polka-pretrain-en-pl-v1) - the pre-training dataset
45
+ * [polka-dpo-v1](https://huggingface.co/datasets/eryk-mazus/polka-dpo-v1) - dataset of DPO pairs
46
+ * [polka-1.1b-chat-gguf](https://huggingface.co/eryk-mazus/polka-1.1b-chat-gguf) - GGUF files for the chat model
47
+
48
+ ## Usage
49
+
50
+ Sample code:
51
+
52
+ ```python
53
+ import torch
54
+ from transformers import AutoTokenizer, AutoModelForCausalLM, TextStreamer
55
+
56
+ model_name = "eryk-mazus/polka-1.1b-chat"
57
+
58
+ tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=True)
59
+ tokenizer.pad_token = tokenizer.eos_token
60
+
61
+ model = AutoModelForCausalLM.from_pretrained(
62
+ model_name,
63
+ torch_dtype=torch.bfloat16 if torch.cuda.is_bf16_supported() else torch.float16,
64
+ device_map="auto"
65
+ )
66
+ streamer = TextStreamer(tokenizer, skip_prompt=True)
67
+
68
+ # You are a helpful assistant.
69
+ system_prompt = "Jesteś pomocnym asystentem."
70
+ chat = [{"role": "system", "content": system_prompt}]
71
+
72
+ # Compose a short song on programming.
73
+ user_input = "Napisz krótką piosenkę o programowaniu."
74
+ chat.append({"role": "user", "content": user_input})
75
+
76
+ # Generate - add_generation_prompt to make sure it continues as assistant
77
+ inputs = tokenizer.apply_chat_template(chat, add_generation_prompt=True, return_tensors="pt")
78
+ # For multi-GPU, find the device of the first parameter of the model
79
+ first_param_device = next(model.parameters()).device
80
+ inputs = inputs.to(first_param_device)
81
+
82
+ with torch.no_grad():
83
+ outputs = model.generate(
84
+ inputs,
85
+ pad_token_id=tokenizer.eos_token_id,
86
+ max_new_tokens=512,
87
+ temperature=0.2,
88
+ repetition_penalty=1.15,
89
+ top_p=0.95,
90
+ do_sample=True,
91
+ streamer=streamer,
92
+ )
93
+
94
+ # Add just the new tokens to our chat
95
+ new_tokens = outputs[0, inputs.size(1):]
96
+ response = tokenizer.decode(new_tokens, skip_special_tokens=True)
97
+ chat.append({"role": "assistant", "content": response})
98
+ ```
99
+
100
+ The model works seamlessly with [vLLM](https://github.com/vllm-project/vllm) as well.
101
+
102
+ ## Prompt format
103
+
104
+ This model uses ChatML as the prompt format:
105
+ ```
106
+ <|im_start|>system
107
+ Jesteś pomocnym asystentem.
108
+ <|im_start|>user
109
+ Jakie jest dzienne zapotrzebowanie kaloryczne dorosłej osoby?<|im_end|>
110
+ <|im_start|>assistant
111
+ Dla dorosłych osób zaleca się spożywanie około 2000-3000 kcal dziennie, aby utrzymać optymalne zdrowie i dobre samopoczucie.<|im_end|>
112
+ ```
113
+
114
+ This prompt is available as a [chat template](https://huggingface.co/docs/transformers/chat_templating), which means you can format messages using the `tokenizer.apply_chat_template()` method, as demonstrated in the example above.
115
+
116
+