|
--- |
|
language: en |
|
tags: |
|
- llama |
|
- quantization |
|
- text-generation |
|
- cpu |
|
- gpu |
|
- efficient-inference |
|
license: apache-2.0 |
|
base_model: |
|
- meta-llama/Llama-3.1-8B |
|
--- |
|
|
|
# Llama 3.1 8B Q4_K_M GGUF Model |
|
|
|
## Overview |
|
This is the quantized version of the Llama 3.1 8B model in Q4_K_M format, optimized for efficient inference on both CPU and GPU. The model was quantized using the llama.cpp library, allowing users to run it in resource-constrained environments . This quantization reduces the model's memory footprint while maintaining strong language generation capabilities. |
|
|
|
The model was originally trained by Meta AI and has been adapted to the GGUF format for compatibility with llama.cpp. |
|
|
|
## Model Details |
|
- **Base Model**: meta-llama/Llama-3.1-8B |
|
- **Quantization Type**: Q4_K_M (4-bit quantization with memory optimization) |
|
- **Model Size**: 8B parameters |
|
- **Format**: GGUF (used for efficient loading in llama.cpp) |
|
- **Intended Use**: Text generation, inference on CPUs/GPUs with reduced memory constraints |
|
|
|
## Intended Use |
|
The model is intended for text generation tasks and is optimized for efficient inference on both CPUs and GPUs, making it suitable for use in resource-constrained environments. |
|
|
|
## License |
|
This model is licensed under the Apache 2.0 License. |
|
|
|
--- |