|
--- |
|
license: apache-2.0 |
|
--- |
|
|
|
# The Quantized LLaMA 3.1 8B Model |
|
|
|
Original Base Model: `meta-llama/Meta-Llama-3.1-8B`.<br> |
|
Link: [https://huggingface.co/meta-llama/Meta-Llama-3.1-8B](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B) |
|
|
|
## Quantization Configurations |
|
``` |
|
"quantization_config": { |
|
"batch_size": 1, |
|
"bits": 4, |
|
"block_name_to_quantize": null, |
|
"cache_block_outputs": true, |
|
"damp_percent": 0.1, |
|
"dataset": null, |
|
"desc_act": false, |
|
"exllama_config": { |
|
"version": 1 |
|
}, |
|
"group_size": 128, |
|
"max_input_length": null, |
|
"model_seqlen": null, |
|
"module_name_preceding_first_block": null, |
|
"modules_in_block_to_quantize": null, |
|
"pad_token_id": null, |
|
"quant_method": "gptq", |
|
"sym": true, |
|
"tokenizer": null, |
|
"true_sequential": true, |
|
"use_cuda_fp16": false, |
|
"use_exllama": true |
|
}, |
|
``` |
|
|
|
## Source Codes |
|
Source Codes: [https://github.com/vkola-lab/medpodgpt/tree/main/quantization](https://github.com/vkola-lab/medpodgpt/tree/main/quantization). |
|
|