vhab10
/

llama_3.1_8b_Q4_K_M-gguf

Text Generation

efficient-inference

Inference Endpoints

Model card Files Files and versions Community

vhab10 commited on Sep 26

Commit

a062dbe

•

1 Parent(s): 318b0c1

Create README.md

Files changed (1) hide show

README.md +35 -0

README.md ADDED Viewed

	@@ -0,0 +1,35 @@

+---
+language: en
+tags:
+- llama
+- quantization
+- text-generation
+- cpu
+- gpu
+- efficient-inference
+license: apache-2.0
+base_model:
+- meta-llama/Llama-3.1-8B
+---
+# Llama 3.1 8B Q4_K_M GGUF Model
+## Overview
+This is the quantized version of the Llama 3.1 8B model in Q4_K_M format, optimized for efficient inference on both CPU and GPU. The model was quantized using the llama.cpp library, allowing users to run it in resource-constrained environments . This quantization reduces the model's memory footprint while maintaining strong language generation capabilities.
+The model was originally trained by Meta AI and has been adapted to the GGUF format for compatibility with llama.cpp.
+## Model Details
+- **Base Model**: meta-llama/Llama-3.1-8B
+- **Quantization Type**: Q4_K_M (4-bit quantization with memory optimization)
+- **Model Size**: 8B parameters
+- **Format**: GGUF (used for efficient loading in llama.cpp)
+- **Intended Use**: Text generation, inference on CPUs/GPUs with reduced memory constraints
+## Intended Use
+The model is intended for text generation tasks and is optimized for efficient inference on both CPUs and GPUs, making it suitable for use in resource-constrained environments.
+## License
+This model is licensed under the Apache 2.0 License.
+---