Spaces:
Runtime error
Runtime error
## Vicuna Weights | |
| Weights version | Link | FastChat version compatibility | Base Model | Release Date | Fine-tuning Data | | |
| ---- | ---- | ---- | ---- | ---- | ---- | | |
| v1.5 | [7B](https://huggingface.co/lmsys/vicuna-7b-v1.5), [7B-16k](https://huggingface.co/lmsys/vicuna-7b-v1.5-16k), [13B](https://huggingface.co/lmsys/vicuna-13b-v1.5), [13B-16k](https://huggingface.co/lmsys/vicuna-13b-v1.5-16k) | `>=0.2.21` | Llama 2 | Aug. 1, 2023 | 370M tokens | | |
| v1.3 | [7B](https://huggingface.co/lmsys/vicuna-7b-v1.3), [13B](https://huggingface.co/lmsys/vicuna-13b-v1.3), [33B](//huggingface.co/lmsys/vicuna-33b-v1.3) | `>=0.2.1` | Llama 1 | Jun. 22, 2023 | 370M tokens | | |
| v1.1 | [7B](https://huggingface.co/lmsys/vicuna-7b-v1.1), [13B](https://huggingface.co/lmsys/vicuna-13b-v1.1) | `>=0.2.1` | Llama 1 | Apr. 12, 2023 | - | | |
| v0 | [7B-delta](https://huggingface.co/lmsys/vicuna-7b-delta-v0), [13B-delta](https://huggingface.co/lmsys/vicuna-13b-delta-v0) | `<=0.1.10` | Llama 1 | Mar. 30, 2023 | - | | |
### Updates | |
- Major updates of weights v1.5 | |
- Use Llama2 as the base model. | |
- Provide 16K context length versions using linear RoPE scaling. | |
- Major updates of weights v1.3 | |
- Train with twice the amount of ShareGPT data compared to previous versions. | |
- Provide merged weights directly instead of delta weights. | |
- Major updates of weights v1.1 | |
- Refactor the tokenization and separator. In Vicuna v1.1, the separator has been changed from `###` to the EOS token `</s>`. This change makes it easier to determine the generation stop criteria and enables better compatibility with other libraries. | |
- Fix the supervised fine-tuning loss computation for better model quality. | |
## Prompt Template | |
### Example prompt (weights v1.1, v1.3, v1.5) | |
``` | |
A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. | |
USER: Hello! | |
ASSISTANT: Hello!</s> | |
USER: How are you? | |
ASSISTANT: I am good.</s> | |
``` | |
See a full prompt template [here](https://github.com/lm-sys/FastChat/blob/d578599c69d060e6d40943f1b5b72af98956092a/fastchat/conversation.py#L286-L299) and example output [here](https://github.com/lm-sys/FastChat/blob/d578599c69d060e6d40943f1b5b72af98956092a/fastchat/conversation.py#L748-L753). | |
### Example prompt (weights v0) | |
``` | |
A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the human's questions. | |
### Human: Hello! | |
### Assistant: Hello! | |
### Human: How are you? | |
### Assistant: I am good. | |
``` | |
See the full prompt template [here](https://github.com/lm-sys/FastChat/blob/d578599c69d060e6d40943f1b5b72af98956092a/fastchat/conversation.py#L238-L269). | |
## How to Apply Delta Weights (Only Needed for Weights v0) | |
We release [Vicuna](https://lmsys.org/blog/2023-03-30-vicuna/) weights v0 as delta weights to comply with the LLaMA model license. | |
You can add our delta to the original LLaMA weights to obtain the Vicuna weights. Instructions: | |
1. Get the original LLaMA weights in the Hugging Face format by following the instructions [here](https://huggingface.co/docs/transformers/main/model_doc/llama). | |
2. Use the following scripts to get Vicuna weights by applying our delta. They will automatically download delta weights from our Hugging Face [account](https://huggingface.co/lmsys). | |
**NOTE**: | |
Weights v1.1 are only compatible with ```transformers>=4.28.0``` and ``fschat >= 0.2.0``. | |
Please update your local packages accordingly. If you follow the above commands to do a fresh install, then you should get all the correct versions. | |
#### Vicuna-7B | |
This conversion command needs around 30 GB of CPU RAM. | |
See the "Low CPU Memory Conversion" section below if you do not have enough memory. | |
Replace `/path/to/*` with the real paths. | |
```bash | |
python3 -m fastchat.model.apply_delta \ | |
--base-model-path /path/to/llama-7b \ | |
--target-model-path /path/to/output/vicuna-7b \ | |
--delta-path lmsys/vicuna-7b-delta-v1.1 | |
``` | |
#### Vicuna-13B | |
This conversion command needs around 60 GB of CPU RAM. | |
See the "Low CPU Memory Conversion" section below if you do not have enough memory. | |
Replace `/path/to/*` with the real paths. | |
```bash | |
python3 -m fastchat.model.apply_delta \ | |
--base-model-path /path/to/llama-13b \ | |
--target-model-path /path/to/output/vicuna-13b \ | |
--delta-path lmsys/vicuna-13b-delta-v1.1 | |
``` | |
#### Low CPU Memory Conversion | |
You can try these methods to reduce the CPU RAM requirement of weight conversion. | |
1. Append `--low-cpu-mem` to the commands above, which will split large weight files into smaller ones and use the disk as temporary storage. This can keep the peak memory at less than 16GB. | |
2. Create a large swap file and rely on the operating system to automatically utilize the disk as virtual memory. | |
## FAQ | |
### Tokenizer issues | |
There are some frequently asked tokenizer issues (https://github.com/lm-sys/FastChat/issues/408). | |
Some of them are not only related to FastChat or Vicuna weights but are also related to how you convert the base llama model. | |
We suggest that you use `transformers>=4.28.0` and redo the weight conversion for the base llama model. | |
After applying the delta, you should have a file named `special_tokens_map.json` in your converted weight folder for either v0 or v1.1. | |
The contents of this file should be the same as this file: https://huggingface.co/lmsys/vicuna-13b-delta-v0/blob/main/special_tokens_map.json. | |
If the file is not present, please copy the `special_tokens_map.json` and `tokenizer_config.json` files from https://huggingface.co/lmsys/vicuna-13b-delta-v0/tree/main to your converted weight folder. This works for both v0 and v1.1. | |