|
--- |
|
license: openrail |
|
model_creator: axiong |
|
model_name: PMC_LLaMA_13B |
|
--- |
|
# PMC_LLaMA_13B - AWQ |
|
- Model creator: [axiong](https://huggingface.co/axiong) |
|
- Original model: [PMC_LLaMA_13B](https://huggingface.co/axiong/PMC_LLaMA_13B) |
|
|
|
## Description |
|
|
|
This repository contains AWQ model files for [PMC_LLaMA_13B](https://huggingface.co/axiong/PMC_LLaMA_13B). |
|
|
|
### About AWQ |
|
|
|
[Activation-aware Weight Quantization (AWQ)](https://arxiv.org/abs/2306.00978) selectively preserves a subset of crucial weights for LLM performance instead of quantizing all weights in a model. This targeted approach minimizes quantization loss, allowing models to operate in 4-bit precision without compromising performance. |
|
|
|
Example of usage with vLLM library: |
|
|
|
```python |
|
from vllm import LLM, SamplingParams |
|
|
|
prompt_input = ( |
|
'### Instruction:\n{instruction}\n\n### Input:\n{input}\n\n### Response:' |
|
) |
|
|
|
examples = [ |
|
{ |
|
"instruction": "You're a doctor, kindly address the medical queries according to the patient's account. Answer the question.", |
|
"input": "What is the mechanism of action of antibiotics?" |
|
}, |
|
{ |
|
"instruction": "You're a doctor, kindly address the medical queries according to the patient's account. Answer the question.", |
|
"input": "How do statins work to lower cholesterol levels?" |
|
}, |
|
{ |
|
"instruction": "You're a doctor, kindly address the medical queries according to the patient's account. Answer the question.", |
|
"input": "Tell me about Paracetamol" |
|
} |
|
] |
|
|
|
prompt_batch = [prompt_input.format_map(example) for example in examples] |
|
|
|
sampling_params = SamplingParams(temperature=0.8, max_tokens=512) |
|
|
|
llm = LLM(model="disi-unibo-nlp/pmc-llama-13b-awq", quantization="awq", dtype="half") |
|
|
|
outputs = llm.generate(prompt_batch, sampling_params) |
|
|
|
# Print the outputs. |
|
for output in outputs: |
|
prompt = output.prompt |
|
generated_text = output.outputs[0].text |
|
print(f"Prompt: {prompt}") |
|
print(f"Response: {generated_text}") |
|
``` |