metadata

language:
  - en
pipeline_tag: text-generation
inference: false
tags:
  - mistral
  - inferentia2
  - neuron
  - neuronx
license: apache-2.0

Neuronx for mistralai/Mistral-7B-Instruct-v0.2 - Updated Mistral 7B Model on AWS Inferentia2 Using AWS Neuron SDK version 2.18~

This model has been exported to the neuron format using specific input_shapes and compiler parameters detailed in the paragraphs below.

Please refer to the 🤗 optimum-neuron documentation for an explanation of these parameters.

Note: To compile the mistralai/Mistral-7B-Instruct-v0.2 on Inf2, you need to update the model config sliding_window (either file or model variable) from null to default 4096.

Usage with 🤗 `TGI`

export HF_TOKEN="hf_xxx"

docker run -d -p 8080:80 \
       -v $(pwd)/data:/data \
       --device=/dev/neuron0 \
       -e HF_TOKEN=${HF_TOKEN} \
       public.ecr.aws/shtian/neuronx-tgi:latest \
       --model-id davidshtian/Mistral-7B-Instruct-v0.2-neuron-1x2048-2-cores-2.18 \
       --max-batch-size 1 \
       --max-input-length 16 \
       --max-total-tokens 32

Usage with 🤗 `optimum-neuron`

>>> from optimum.neuron import pipeline

>>> p = pipeline('text-generation', 'davidshtian/Mistral-7B-Instruct-v0.2-neuron-1x2048-2-cores-2.18')
>>> p("My favorite place on earth is", max_new_tokens=64, do_sample=True, top_k=50)
[{'generated_text': "My favorite place on earth is probably Paris, France, and if I were to go there
now I would take my partner on a romantic getaway where we could lay on the grass in the park,
eat delicious French cheeses and wine, and watch the sunset on the Seine river.'"}]

This repository contains tags specific to versions of neuronx. When using with 🤗 optimum-neuron, use the repo revision specific to the version of neuronx you are using, to load the right serialized checkpoints.

Arguments passed during export

input_shapes

{
  "batch_size": 1,
  "sequence_length": 2048,
}

compiler_args

{
  "auto_cast_type": "bf16",
  "num_cores": 2,
}

Neuronx for mistralai/Mistral-7B-Instruct-v0.2 - Updated Mistral 7B Model on AWS Inferentia2 Using AWS Neuron SDK version 2.18~

Usage with 🤗 TGI

Usage with 🤗 optimum-neuron

Arguments passed during export

Usage with 🤗 `TGI`

Usage with 🤗 `optimum-neuron`