Neuronx model for upstage/SOLAR-10.7B-v1.0
This repository contains AWS Inferentia2 and neuronx
compatible checkpoints for codellama/CodeLlama-7b-hf.
You can find detailed information about the base model on its Model Card.
This model card also includes instructions for how to compile other SOLAR models with other settings if this combination isn't quite what you are looking for.
This model has been exported to the neuron
format using specific input_shapes
and compiler
parameters detailed in the paragraphs below.
It has been compiled to run on an inf2.24xlarge instance on AWS.
This has been compiled using version 2.16 of the Neuron SDK. Make sure your environment has version 2.16 installed
Please refer to the ๐ค optimum-neuron
documentation for an explanation of these parameters.
Set up the environment
First, use the DLAMI image from Hugging Face. It has most of the utilities and drivers preinstalled. However, you may need to update to version 2.16 to use these binaries.
sudo apt-get update -y \
&& sudo apt-get install -y --no-install-recommends \
aws-neuronx-dkms=2.15.9.0 \
aws-neuronx-collectives=2.19.7.0-530fb3064 \
aws-neuronx-runtime-lib=2.19.5.0-97e2d271b \
aws-neuronx-tools=2.16.1.0
pip3 install --upgrade \
neuronx-cc==2.12.54.0 \
torch-neuronx==1.13.1.1.13.0 \
transformers-neuronx==0.9.474 \
--extra-index-url=https://pip.repos.neuron.amazonaws.com
pip3 install git+https://github.com/huggingface/optimum-neuron.git
Running inference from this repository
from optimum.neuron import pipeline
p = pipeline('text-generation', 'jburtoft/SOLAR-10.7B-v1.0-neuron-24xlarge-4096')
p("import socket\n\ndef ping_exponential_backoff(host: str):",
do_sample=True,
top_k=10,
temperature=0.1,
top_p=0.95,
num_return_sequences=1,
max_length=200,
)
[{generated text here}]
##Compiling for different instances or settings
If this repository doesn't have the exact version or settings, you can compile your own.
(to be added)
This repository contains tags specific to versions of neuronx
. When using with ๐ค optimum-neuron
, use the repo revision specific to the version of neuronx
you are using, to load the right serialized checkpoints.
Arguments passed during export
input_shapes
{
"batch_size": 1,
"sequence_length": 4096,
}
compiler_args
{
"auto_cast_type": "fp16",
"num_cores": 12,
}
- Downloads last month
- 16