|
--- |
|
license: apache-2.0 |
|
language: |
|
- en |
|
pipeline_tag: text-generation |
|
inference: false |
|
tags: |
|
- pytorch |
|
- inferentia2 |
|
- neuron |
|
--- |
|
# Neuronx model for [upstage/SOLAR-10.7B-v1.0](https://huggingface.co/upstage/SOLAR-10.7B-v1.0) |
|
|
|
This repository contains [**AWS Inferentia2**](https://aws.amazon.com/ec2/instance-types/inf2/) and [`neuronx`](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/) compatible checkpoints for [codellama/CodeLlama-7b-hf](https://huggingface.co/codellama/CodeLlama-7b-hf). |
|
You can find detailed information about the base model on its [Model Card](https://huggingface.co/upstage/SOLAR-10.7B-v1.0). |
|
|
|
This model card also includes instructions for how to compile other SOLAR models with other settings if this combination isn't quite what you are looking for. |
|
|
|
This model has been exported to the `neuron` format using specific `input_shapes` and `compiler` parameters detailed in the paragraphs below. |
|
|
|
It has been compiled to run on an inf2.24xlarge instance on AWS. |
|
|
|
**This has been compiled using version 2.16 of the Neuron SDK. Make sure your environment has version 2.16 installed** |
|
|
|
Please refer to the 🤗 `optimum-neuron` [documentation](https://huggingface.co/docs/optimum-neuron/main/en/guides/models#configuring-the-export-of-a-generative-model) for an explanation of these parameters. |
|
|
|
## Set up the environment |
|
|
|
First, use the [DLAMI image from Hugging Face](https://aws.amazon.com/marketplace/pp/prodview-gr3e6yiscria2). It has most of the utilities and drivers preinstalled. However, you may need to update to version 2.16 to use these binaries. |
|
|
|
``` |
|
sudo apt-get update -y \ |
|
&& sudo apt-get install -y --no-install-recommends \ |
|
aws-neuronx-dkms=2.15.9.0 \ |
|
aws-neuronx-collectives=2.19.7.0-530fb3064 \ |
|
aws-neuronx-runtime-lib=2.19.5.0-97e2d271b \ |
|
aws-neuronx-tools=2.16.1.0 |
|
|
|
pip3 install --upgrade \ |
|
neuronx-cc==2.12.54.0 \ |
|
torch-neuronx==1.13.1.1.13.0 \ |
|
transformers-neuronx==0.9.474 \ |
|
--extra-index-url=https://pip.repos.neuron.amazonaws.com |
|
|
|
pip3 install git+https://github.com/huggingface/optimum-neuron.git |
|
``` |
|
## Running inference from this repository |
|
|
|
|
|
``` |
|
from optimum.neuron import pipeline |
|
p = pipeline('text-generation', 'jburtoft/SOLAR-10.7B-v1.0-neuron-24xlarge-4096') |
|
p("import socket\n\ndef ping_exponential_backoff(host: str):", |
|
do_sample=True, |
|
top_k=10, |
|
temperature=0.1, |
|
top_p=0.95, |
|
num_return_sequences=1, |
|
max_length=200, |
|
) |
|
``` |
|
``` |
|
[{generated text here}] |
|
``` |
|
|
|
##Compiling for different instances or settings |
|
|
|
If this repository doesn't have the exact version or settings, you can compile your own. |
|
|
|
(to be added) |
|
|
|
This repository contains tags specific to versions of `neuronx`. When using with 🤗 `optimum-neuron`, use the repo revision specific to the version of `neuronx` you are using, to load the right serialized checkpoints. |
|
|
|
## Arguments passed during export |
|
|
|
**input_shapes** |
|
```json |
|
{ |
|
"batch_size": 1, |
|
"sequence_length": 4096, |
|
} |
|
``` |
|
**compiler_args** |
|
|
|
```json |
|
{ |
|
"auto_cast_type": "fp16", |
|
"num_cores": 12, |
|
} |
|
``` |
|
|