File size: 1,971 Bytes
ad2fd34 7c372d2 5df075b c69c8f9 3df32f3 c69c8f9 fe7ba36 c69c8f9 3e359ce d7a5687 5ac00d7 d7a5687 275d684 d912cf2 d7a5687 d5f2846 d7a5687 275d684 d7a5687 fe66957 d7a5687 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 |
---
license: openrail
model_creator: axiong
model_name: PMC_LLaMA_13B
---
# PMC_LLaMA_13B - AWQ
- Model creator: [axiong](https://huggingface.co/axiong)
- Original model: [PMC_LLaMA_13B](https://huggingface.co/axiong/PMC_LLaMA_13B)
## Description
This repository contains AWQ model files for [PMC_LLaMA_13B](https://huggingface.co/axiong/PMC_LLaMA_13B).
### About AWQ
[Activation-aware Weight Quantization (AWQ)](https://arxiv.org/abs/2306.00978) selectively preserves a subset of crucial weights for LLM performance instead of quantizing all weights in a model. This targeted approach minimizes quantization loss, allowing models to operate in 4-bit precision without compromising performance.
Example of usage with vLLM library:
```python
from vllm import LLM, SamplingParams
prompt_input = (
'### Instruction:\n{instruction}\n\n### Input:\n{input}\n\n### Response:'
)
examples = [
{
"instruction": "You're a doctor, kindly address the medical queries according to the patient's account. Answer the question.",
"input": "What is the mechanism of action of antibiotics?"
},
{
"instruction": "You're a doctor, kindly address the medical queries according to the patient's account. Answer the question.",
"input": "How do statins work to lower cholesterol levels?"
},
{
"instruction": "You're a doctor, kindly address the medical queries according to the patient's account. Answer the question.",
"input": "Tell me about Paracetamol"
}
]
prompt_batch = [prompt_input.format_map(example) for example in examples]
sampling_params = SamplingParams(temperature=0.8, max_tokens=512)
llm = LLM(model="disi-unibo-nlp/pmc-llama-13b-awq", quantization="awq", dtype="half")
outputs = llm.generate(prompt_batch, sampling_params)
# Print the outputs.
for output in outputs:
prompt = output.prompt
generated_text = output.outputs[0].text
print(f"Prompt: {prompt}")
print(generated_text)
``` |