---
license: llama2
language:
- en
library_name: transformers
datasets:
- togethercomputer/llama-instruct
---

# LLaMA-2-7B-32K-Chat

## Model Description

LLaMA-2-7B-32K-Chat is an open-source, long-context chat model finetuned from [LLaMA-2-7B-32K](https://huggingface.co/togethercomputer/LLaMA-2-7B-32K), over high-quality instruction and chat data.
We built Llama-2-7B-32K-Chat with less than 200 lines of Python script using [Together API](https://together.ai/blog/api-announcement), and we also make the recipe fully available.
For more details, please refer to our [Github repo](https://github.com/togethercomputer/LLaMA-2-32K-Chat).
We hope that this can enable everyone to finetune their own version of [LLaMA-2-7B-32K](https://huggingface.co/togethercomputer/LLaMA-2-7B-32K) — play with [Together API](https://together.ai/blog/api-announcement) and give us feedback! 

## Data Collection Details

LLaMA-2-7B-32K-Chat is fine-tuned over a combination of two parts:
1. **19K single- and multi-round conversations generated by human instructions and [Llama-2-70B-Chat](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf) outputs**.
   We collected the dataset following the distillation paradigm that is used by Alpaca, Vicuna, WizardLM, Orca — producing instructions by querying a powerful LLM (in this case, [Llama-2-70B-Chat](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf)).
   The complete dataset is also released [here](https://huggingface.co/datasets/togethercomputer/llama-instruct).
   We also share the complete recipe for the data collection process [here](https://github.com/togethercomputer/LLaMA-2-32K-Chat).
   
3. **4K instructions of summarization from the BookSum datasets**.
   BookSum is a unique dataset designed to address the challenges of long-form narrative summarization.
   This dataset features source documents from the literature domain, including novels, plays, and stories, and offers human-written, highly abstractive summaries.
   We here focus on chapter-level data. BookSum poses a unique set of challenges, necessitating that the model comprehensively read through each chapter.
   We used 4K of the instructions in our fine-tuning.


## Model Usage

We encourage you to try out this model using the [Together API](https://together.ai/blog/api-announcement). The updated inference stack allows for efficient inference.
Alternatively, you can load the model directly from the Hugging Face model hub using

```python
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("togethercomputer/LLaMA-2-7B-32K-Chat")
model = AutoModelForCausalLM.from_pretrained("togethercomputer/LLaMA-2-7B-32K-Chat", trust_remote_code=True, torch_dtype=torch.float16)
```

The model is also hosted on [Together Playground](https://api.together.xyz/playground). You can simply play with the model by using prompt formatted by:

```
[INST] <your instruction here> [\INST].
```

For example, if we query the model with 

```
[INST] Write a poem about cats [\INST],
```

the model will return
```
Sure, here's a short poem about cats:

Cats, cats, oh how they enthrall,
With their soft fur and gentle grace,
They purr with joy, their eyes aglow,
And steal our hearts with their charm and grace.

In their own time, they move about,
With graceful strides, they tiptoe out,
Their whiskers twitching, as they go,
Their curiosity piqued, they want to know.

They climb, they jump, they play, they chase,
Their agility, a sight to behold,
Their naps, a treat, a break from the chase,
A time to rest, a time to be told.

Cats, cats, they're unique and special,
A part of our lives, a treasure to hold,
Their personalities, so diverse,
Their charm, a gift, that's forever told.
```

## Model Evaluation

We evaluate the model with [PG19 dataset](https://huggingface.co/datasets/pg19) and compare the perplexity with [Llama-2-7b-chat](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf),
the results are summarized below (note that the perplexity is normalized following the protocol [here](https://together.ai/blog/llama-2-7b-32k)).

| Model | 2K Seq | 4K Seq | 8K Seq | 16K Seq | 32K Seq |
| -------- | ------- | ------- | ------- | ------- | ------- |
| LLaMA-2-7B-Chat (Meta) | 1.844 | 1.833 | N/A | N/A | N/A |
| LLaMA-2-7B-32K-Chat (ours) | 1.813 | 1.798 | 1.781 | 1.778 | 1.772|

We observe that LLaMA-2-7B-32K-Chat obtains reasonable (and even better) perplexity, comparable to the original LLaMA-2-7B-Chat model.

## Limitations and Bias

As with all language models, LLaMA-2-7B-32K-Chat may generate incorrect or biased content. It's important to keep this in mind when using the model.

## Community

Join us on [Together Discord](https://discord.gg/6ZVDU8tTD4)