jartine commited on
Commit
9fa6731
1 Parent(s): 114c61e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +52 -25
README.md CHANGED
@@ -26,16 +26,15 @@ history_template: |
26
  {{message}}<|eot_id|>
27
  ---
28
 
29
- # Meta Llama 3.1 8B - llamafile
30
 
31
  This is a large language model that was released by Meta on 2024-07-23.
32
- It's big enough to be capable of being put to serious use, and it's
33
- small enough to be capable of running on most personal computers. This
34
- repo contains the base model, which has not been fine-tuned to follow
35
- instructions.
36
 
37
  - Model creator: [Meta](https://huggingface.co/meta-llama/)
38
- - Original model: [meta-llama/Meta-Llama-3.1-8B](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B)
39
 
40
  Mozilla has packaged the LLaMA model into executable weights that we
41
  call [llamafiles](https://github.com/Mozilla-Ocho/llamafile). This gives
@@ -45,20 +44,15 @@ FreeBSD, OpenBSD and NetBSD systems you control on both AMD64 and ARM64.
45
  ## Quickstart
46
 
47
  Running the following on a desktop OS will launch a tab in your web
48
- browser.
49
 
50
  ```
51
- wget https://huggingface.co/Mozilla/Meta-Llama-3.1-8B-llamafile/resolve/main/Meta-Llama-3.1-8B.Q6_K.llamafile
52
- chmod +x Meta-Llama-3.1-8B.Q6_K.llamafile
53
- ./Meta-Llama-3.1-8B.Q6_K.llamafile
54
  ```
55
 
56
- You can then use the completion mode of the GUI to experiment with this
57
- model. You can prompt the model for completions on the command line too:
58
-
59
- ```
60
- ./Meta-Llama-3.1-8B.Q6_K.llamafile -p 'four score and seven' --log-disable
61
- ```
62
 
63
  This model has a max context window size of 128k tokens. By default, a
64
  context window size of 512 tokens is used. You may increase this to the
@@ -79,6 +73,25 @@ Having **trouble?** See the ["Gotchas"
79
  section](https://github.com/mozilla-ocho/llamafile/?tab=readme-ov-file#gotchas)
80
  of the README.
81
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
82
  ## About llamafile
83
 
84
  llamafile is a new format introduced by Mozilla Ocho on Nov 20th 2023.
@@ -188,35 +201,49 @@ Where to send questions or comments about the model Instructions on how to provi
188
 
189
  ## How to use
190
 
191
- This repository contains two versions of Meta-Llama-3.1-8B, for use with transformers and with the original `llama` codebase.
192
 
193
  ### Use with transformers
194
 
195
- Starting with transformers >= 4.43.0 onward, you can run conversational inference using the Transformers pipeline abstraction or by leveraging the Auto classes with the generate() function.
196
 
197
- Make sure to update your transformers installation via pip install --upgrade transformers.
198
 
199
  ```python
200
  import transformers
201
  import torch
202
 
203
- model_id = "meta-llama/Meta-Llama-3.1-8B"
204
 
205
  pipeline = transformers.pipeline(
206
- "text-generation", model=model_id, model_kwargs={"torch_dtype": torch.bfloat16}, device_map="auto"
 
 
 
207
  )
208
 
209
- pipeline("Hey how are you doing today?")
 
 
 
 
 
 
 
 
 
210
  ```
211
 
 
 
212
  ### Use with `llama`
213
 
214
- Please, follow the instructions in the [repository](https://github.com/meta-llama/llama).
215
 
216
  To download Original checkpoints, see the example command below leveraging `huggingface-cli`:
217
 
218
  ```
219
- huggingface-cli download meta-llama/Meta-Llama-3.1-8B --include "original/*" --local-dir Meta-Llama-3.1-8B
220
  ```
221
 
222
  ## Hardware and Software
@@ -1096,4 +1123,4 @@ Finally, we put in place a set of resources including an [output reporting mecha
1096
 
1097
  The core values of Llama 3.1 are openness, inclusivity and helpfulness. It is meant to serve everyone, and to work for a wide range of use cases. It is thus designed to be accessible to people across many different backgrounds, experiences and perspectives. Llama 3.1 addresses users and their needs as they are, without insertion unnecessary judgment or normativity, while reflecting the understanding that even content that may appear problematic in some cases can serve valuable purposes in others. It respects the dignity and autonomy of all users, especially in terms of the values of free thought and expression that power innovation and progress.
1098
 
1099
- But Llama 3.1 is a new technology, and like any new technology, there are risks associated with its use. Testing conducted to date has not covered, nor could it cover, all scenarios. For these reasons, as with all LLMs, Llama 3.1’s potential outputs cannot be predicted in advance, and the model may in some instances produce inaccurate, biased or other objectionable responses to user prompts. Therefore, before deploying any applications of Llama 3.1 models, developers should perform safety testing and tuning tailored to their specific applications of the model. Please refer to available resources including our [Responsible Use Guide](https://llama.meta.com/responsible-use-guide), [Trust and Safety](https://llama.meta.com/trust-and-safety/) solutions, and other [resources](https://llama.meta.com/docs/get-started/) to learn more about responsible development.
 
26
  {{message}}<|eot_id|>
27
  ---
28
 
29
+ # Meta Llama 3.1 8B Instruct - llamafile
30
 
31
  This is a large language model that was released by Meta on 2024-07-23.
32
+ It was fine-tuned by Meta to follow your instructions. It's big enough
33
+ to be capable of being put to serious use, and it's small enough to be
34
+ capable of running on most personal computers.
 
35
 
36
  - Model creator: [Meta](https://huggingface.co/meta-llama/)
37
+ - Original model: [meta-llama/Meta-Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct)
38
 
39
  Mozilla has packaged the LLaMA model into executable weights that we
40
  call [llamafiles](https://github.com/Mozilla-Ocho/llamafile). This gives
 
44
  ## Quickstart
45
 
46
  Running the following on a desktop OS will launch a tab in your web
47
+ browser with a chatbot interface.
48
 
49
  ```
50
+ wget https://huggingface.co/Mozilla/Meta-Llama-3.1-8B-Instruct-llamafile/resolve/main/Meta-Llama-3.1-8B-Instruct.Q6_K.llamafile
51
+ chmod +x Meta-Llama-3.1-8B-Instruct.Q6_K.llamafile
52
+ ./Meta-Llama-3.1-8B-Instruct.Q6_K.llamafile
53
  ```
54
 
55
+ You then need to fill out the prompt / history template (see below).
 
 
 
 
 
56
 
57
  This model has a max context window size of 128k tokens. By default, a
58
  context window size of 512 tokens is used. You may increase this to the
 
73
  section](https://github.com/mozilla-ocho/llamafile/?tab=readme-ov-file#gotchas)
74
  of the README.
75
 
76
+ ## Prompting
77
+
78
+ To have a good working chat experience when using the web GUI, you need
79
+ to fill out the text fields with the following values.
80
+
81
+ Prompt template:
82
+
83
+ ```
84
+ <|begin_of_text|><|start_header_id|>system<|end_header_id|>
85
+ {{prompt}}<|eot_id|>{{history}}<|start_header_id|>{{char}}<|end_header_id|>
86
+ ```
87
+
88
+ History template:
89
+
90
+ ```
91
+ <|start_header_id|>{{name}}<|end_header_id|>
92
+ {{message}}<|eot_id|>
93
+ ```
94
+
95
  ## About llamafile
96
 
97
  llamafile is a new format introduced by Mozilla Ocho on Nov 20th 2023.
 
201
 
202
  ## How to use
203
 
204
+ This repository contains two versions of Meta-Llama-3.1-8B-Instruct, for use with transformers and with the original `llama` codebase.
205
 
206
  ### Use with transformers
207
 
208
+ Starting with `transformers >= 4.43.0` onward, you can run conversational inference using the Transformers `pipeline` abstraction or by leveraging the Auto classes with the `generate()` function.
209
 
210
+ Make sure to update your transformers installation via `pip install --upgrade transformers`.
211
 
212
  ```python
213
  import transformers
214
  import torch
215
 
216
+ model_id = "meta-llama/Meta-Llama-3.1-8B-Instruct"
217
 
218
  pipeline = transformers.pipeline(
219
+ "text-generation",
220
+ model=model_id,
221
+ model_kwargs={"torch_dtype": torch.bfloat16},
222
+ device_map="auto",
223
  )
224
 
225
+ messages = [
226
+ {"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
227
+ {"role": "user", "content": "Who are you?"},
228
+ ]
229
+
230
+ outputs = pipeline(
231
+ messages,
232
+ max_new_tokens=256,
233
+ )
234
+ print(outputs[0]["generated_text"][-1])
235
  ```
236
 
237
+ Note: You can also find detailed recipes on how to use the model locally, with `torch.compile()`, assisted generations, quantised and more at [`huggingface-llama-recipes`](https://github.com/huggingface/huggingface-llama-recipes)
238
+
239
  ### Use with `llama`
240
 
241
+ Please, follow the instructions in the [repository](https://github.com/meta-llama/llama)
242
 
243
  To download Original checkpoints, see the example command below leveraging `huggingface-cli`:
244
 
245
  ```
246
+ huggingface-cli download meta-llama/Meta-Llama-3.1-8B-Instruct --include "original/*" --local-dir Meta-Llama-3.1-8B-Instruct
247
  ```
248
 
249
  ## Hardware and Software
 
1123
 
1124
  The core values of Llama 3.1 are openness, inclusivity and helpfulness. It is meant to serve everyone, and to work for a wide range of use cases. It is thus designed to be accessible to people across many different backgrounds, experiences and perspectives. Llama 3.1 addresses users and their needs as they are, without insertion unnecessary judgment or normativity, while reflecting the understanding that even content that may appear problematic in some cases can serve valuable purposes in others. It respects the dignity and autonomy of all users, especially in terms of the values of free thought and expression that power innovation and progress.
1125
 
1126
+ But Llama 3.1 is a new technology, and like any new technology, there are risks associated with its use. Testing conducted to date has not covered, nor could it cover, all scenarios. For these reasons, as with all LLMs, Llama 3.1’s potential outputs cannot be predicted in advance, and the model may in some instances produce inaccurate, biased or other objectionable responses to user prompts. Therefore, before deploying any applications of Llama 3.1 models, developers should perform safety testing and tuning tailored to their specific applications of the model. Please refer to available resources including our [Responsible Use Guide](https://llama.meta.com/responsible-use-guide), [Trust and Safety](https://llama.meta.com/trust-and-safety/) solutions, and other [resources](https://llama.meta.com/docs/get-started/) to learn more about responsible development.