Update README.md: add PG19 evaluation results
Browse files
README.md
CHANGED
@@ -80,6 +80,18 @@ Their personalities, so diverse,
|
|
80 |
Their charm, a gift, that's forever told.
|
81 |
```
|
82 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
83 |
## Limitations and Bias
|
84 |
|
85 |
As with all language models, LLaMA-2-7B-32K-Chat may generate incorrect or biased content. It's important to keep this in mind when using the model.
|
|
|
80 |
Their charm, a gift, that's forever told.
|
81 |
```
|
82 |
|
83 |
+
## Model Evaluation
|
84 |
+
|
85 |
+
We evaluate the model with [PG19 dataset](https://huggingface.co/datasets/pg19) and compare the perplexity with [Llama-2-7b-chat](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf),
|
86 |
+
the results are summarized below (note that the perplexity is normalized following the protocol [here](https://together.ai/blog/llama-2-7b-32k)).
|
87 |
+
|
88 |
+
| Model | 2K Seq | 4K Seq | 8K Seq | 16K Seq | 32K Seq |
|
89 |
+
| -------- | ------- | ------- | ------- | ------- | ------- |
|
90 |
+
| LLaMA-2-7B-Chat (Meta) | 1.844 | 1.833 | N/A | N/A | N/A |
|
91 |
+
| LLaMA-2-7B-32K-Chat (ours) | 1.813 | 1.798 | 1.781 | 1.778 | 1.772|
|
92 |
+
|
93 |
+
We observe that LLaMA-2-7B-32K-Chat obtains reasonable (and even better) perplexity, comparable to the original LLaMA-2-7B-Chat model.
|
94 |
+
|
95 |
## Limitations and Bias
|
96 |
|
97 |
As with all language models, LLaMA-2-7B-32K-Chat may generate incorrect or biased content. It's important to keep this in mind when using the model.
|