File size: 2,297 Bytes
27015bc d880cf5 27015bc d880cf5 ee0bcdd d880cf5 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 |
---
language:
- en
pipeline_tag: text-generation
inference: false
tags:
- mistral
- inferentia2
- neuron
- neuronx
license: apache-2.0
---
# Neuronx for [mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2) - Updated Mistral 7B Model on [AWS Inferentia2](https://aws.amazon.com/ec2/instance-types/inf2/) Using AWS Neuron SDK version 2.18~
This model has been exported to the `neuron` format using specific `input_shapes` and `compiler` parameters detailed in the paragraphs below.
Please refer to the 🤗 `optimum-neuron` [documentation](https://huggingface.co/docs/optimum-neuron/main/en/guides/models#configuring-the-export-of-a-generative-model) for an explanation of these parameters.
Note: To compile the mistralai/Mistral-7B-Instruct-v0.2 on Inf2, you need to update the model config sliding_window (either file or model variable) from null to default 4096.
## Usage with 🤗 `TGI`
```shell
export HF_TOKEN="hf_xxx"
docker run -d -p 8080:80 \
-v $(pwd)/data:/data \
--device=/dev/neuron0 \
-e HF_TOKEN=${HF_TOKEN} \
public.ecr.aws/shtian/neuronx-tgi:latest \
--model-id davidshtian/Mistral-7B-Instruct-v0.2-neuron-1x2048-2-cores-2.18 \
--max-batch-size 1 \
--max-input-length 16 \
--max-total-tokens 32
```
## Usage with 🤗 `optimum-neuron`
```python
>>> from optimum.neuron import pipeline
>>> p = pipeline('text-generation', 'davidshtian/Mistral-7B-Instruct-v0.2-neuron-1x2048-2-cores-2.18')
>>> p("My favorite place on earth is", max_new_tokens=64, do_sample=True, top_k=50)
[{'generated_text': "My favorite place on earth is probably Paris, France, and if I were to go there
now I would take my partner on a romantic getaway where we could lay on the grass in the park,
eat delicious French cheeses and wine, and watch the sunset on the Seine river.'"}]
```
This repository contains tags specific to versions of `neuronx`. When using with 🤗 `optimum-neuron`, use the repo revision specific to the version of `neuronx` you are using, to load the right serialized checkpoints.
## Arguments passed during export
**input_shapes**
```json
{
"batch_size": 1,
"sequence_length": 2048,
}
```
**compiler_args**
```json
{
"auto_cast_type": "bf16",
"num_cores": 2,
}
``` |