Update README.md
Browse files
README.md
CHANGED
@@ -11,50 +11,58 @@ datasets:
|
|
11 |
|
12 |
## Model Description
|
13 |
|
14 |
-
LLaMA-2-7B-32K-Chat is an open-source, long-context chat model finetuned from [
|
15 |
We build Llama-2-7B-32K-Chat with less than 200 lines of Python script using [Together API](https://together.ai/blog/api-announcement), and we also make the recipe fully available.
|
16 |
-
We hope that this can enable everyone to finetune their own version of [
|
17 |
|
18 |
-
Llama-2-7B-32K-Chat is fine-tuned over 19K single- and multi-round conversations generated by human instructions and Llama-2-70B-Chat outputs,
|
19 |
-
The dataset is also released [here](https://huggingface.co/datasets/togethercomputer/llama-instruct).
|
20 |
|
21 |
-
##
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
22 |
|
23 |
You can use the [Together API](https://together.ai/blog/api-announcement) to try out LLaMA-2-7B-32K-Chat for inference.
|
24 |
The updated inference stack allows for efficient inference.
|
|
|
25 |
|
26 |
-
To run the model locally, we strongly recommend to install Flash Attention V2, which is necessary to obtain the best performance:
|
27 |
```
|
28 |
-
|
29 |
-
export CUDA_HOME=/usr/local/cuda-11.8
|
30 |
-
pip install transformers==4.31.0
|
31 |
-
pip install sentencepiece
|
32 |
-
pip install ninja
|
33 |
-
pip install flash-attn --no-build-isolation
|
34 |
-
pip install git+https://github.com/HazyResearch/flash-attention.git#subdirectory=csrc/rotary
|
35 |
```
|
36 |
|
37 |
-
|
38 |
-
|
39 |
-
```python
|
40 |
-
from transformers import AutoTokenizer, AutoModelForCausalLM
|
41 |
|
42 |
-
|
43 |
-
|
44 |
-
|
45 |
-
input_context = "Your text here"
|
46 |
-
input_ids = tokenizer.encode(input_context, return_tensors="pt")
|
47 |
-
output = model.generate(input_ids, max_length=128, temperature=0.7)
|
48 |
-
output_text = tokenizer.decode(output[0], skip_special_tokens=True)
|
49 |
-
print(output_text)
|
50 |
```
|
51 |
|
52 |
-
|
53 |
-
|
54 |
-
To chat with the model, the prompt is in the format of
|
55 |
-
|
56 |
```
|
57 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
58 |
```
|
59 |
|
60 |
## Limitations and Bias
|
|
|
11 |
|
12 |
## Model Description
|
13 |
|
14 |
+
LLaMA-2-7B-32K-Chat is an open-source, long-context chat model finetuned from [LLaMA-2-7B-32K](https://huggingface.co/togethercomputer/LLaMA-2-7B-32K), over high-quality instructions and chat data.
|
15 |
We build Llama-2-7B-32K-Chat with less than 200 lines of Python script using [Together API](https://together.ai/blog/api-announcement), and we also make the recipe fully available.
|
16 |
+
We hope that this can enable everyone to finetune their own version of [LLaMA-2-7B-32K](https://huggingface.co/togethercomputer/LLaMA-2-7B-32K) — play with [Together API](https://together.ai/blog/api-announcement) and give us feedback!
|
17 |
|
|
|
|
|
18 |
|
19 |
+
## Data Collection
|
20 |
+
|
21 |
+
LLaMA-2-7B-32K-Chat is fine-tuned over two datasets: (1) 19K single- and multi-round conversations generated by human instructions and
|
22 |
+
Llama-2-70B-Chat outputs, along with (2) 4K instructions of summarization from the BookSum datasets.
|
23 |
+
We collected dataset (1) following the distillation paradigm that is used by Alpaca, Vicuna, WizardLM, Orca — producing instructions by querying a powerful LLM.
|
24 |
+
We release such dataset [here](https://huggingface.co/datasets/togethercomputer/llama-instruct). For more details, please refer to our [Github repo](https://github.com/togethercomputer/LLaMA-2-32K-Chat).
|
25 |
+
|
26 |
+
|
27 |
+
## Model Usage
|
28 |
|
29 |
You can use the [Together API](https://together.ai/blog/api-announcement) to try out LLaMA-2-7B-32K-Chat for inference.
|
30 |
The updated inference stack allows for efficient inference.
|
31 |
+
The model will also be hosted on [Together Playground](https://api.together.xyz/playground). You can simply play with the model by using prompt formatted by
|
32 |
|
|
|
33 |
```
|
34 |
+
[INST] <your instruction here> [\INST].
|
|
|
|
|
|
|
|
|
|
|
|
|
35 |
```
|
36 |
|
37 |
+
For example, if we query the model with
|
|
|
|
|
|
|
38 |
|
39 |
+
```
|
40 |
+
[INST] Write a poem about cats [\INST],
|
|
|
|
|
|
|
|
|
|
|
|
|
41 |
```
|
42 |
|
43 |
+
the model will return
|
|
|
|
|
|
|
44 |
```
|
45 |
+
Sure, here's a short poem about cats:
|
46 |
+
|
47 |
+
Cats, cats, oh how they enthrall,
|
48 |
+
With their soft fur and gentle grace,
|
49 |
+
They purr with joy, their eyes aglow,
|
50 |
+
And steal our hearts with their charm and grace.
|
51 |
+
|
52 |
+
In their own time, they move about,
|
53 |
+
With graceful strides, they tiptoe out,
|
54 |
+
Their whiskers twitching, as they go,
|
55 |
+
Their curiosity piqued, they want to know.
|
56 |
+
|
57 |
+
They climb, they jump, they play, they chase,
|
58 |
+
Their agility, a sight to behold,
|
59 |
+
Their naps, a treat, a break from the chase,
|
60 |
+
A time to rest, a time to be told.
|
61 |
+
|
62 |
+
Cats, cats, they're unique and special,
|
63 |
+
A part of our lives, a treasure to hold,
|
64 |
+
Their personalities, so diverse,
|
65 |
+
Their charm, a gift, that's forever told.
|
66 |
```
|
67 |
|
68 |
## Limitations and Bias
|