instruction-pretrain
commited on
Commit
•
91c109a
1
Parent(s):
2e7b4f8
Update README.md
Browse files
README.md
CHANGED
@@ -128,16 +128,15 @@ We simply discard the system prompts.
|
|
128 |
|
129 |
**To put it all together, the text before tokenization looks like this:**
|
130 |
|
131 |
-
|
132 |
-
|
133 |
-
or
|
134 |
-
|
135 |
-
`instruction_augmented_text = "<|begin_of_text|>{instruction augmented text}<|end_of_text|>"`
|
136 |
|
|
|
|
|
137 |
Then, for tokenization, you don't need to add BOS and EOS token ids. The tokenization code looks like this:
|
138 |
-
|
139 |
-
|
140 |
-
|
141 |
|
142 |
## Citation
|
143 |
If you find our work helpful, please cite us:
|
|
|
128 |
|
129 |
**To put it all together, the text before tokenization looks like this:**
|
130 |
|
131 |
+
```python
|
132 |
+
general_instruction_response_text = "<|begin_of_text|>{question} {response}<|end_of_text|>"
|
|
|
|
|
|
|
133 |
|
134 |
+
instruction_augmented_text = "<|begin_of_text|>{instruction augmented text}<|end_of_text|>"
|
135 |
+
```
|
136 |
Then, for tokenization, you don't need to add BOS and EOS token ids. The tokenization code looks like this:
|
137 |
+
```python
|
138 |
+
text_ids = tokenizer(text, add_special_tokens=False, **kwargs).input_ids
|
139 |
+
```
|
140 |
|
141 |
## Citation
|
142 |
If you find our work helpful, please cite us:
|