Felladrin/Llama-160M-Chat-v1-GGUF

Quantized GGUF model files for Llama-160M-Chat-v1 from Felladrin

Name Quant method Size
llama-160m-chat-v1.fp16.gguf fp16 326.58 MB
llama-160m-chat-v1.q2_k.gguf q2_k 77.23 MB
llama-160m-chat-v1.q3_k_m.gguf q3_k_m 87.54 MB
llama-160m-chat-v1.q4_k_m.gguf q4_k_m 104.03 MB
llama-160m-chat-v1.q5_k_m.gguf q5_k_m 119.04 MB
llama-160m-chat-v1.q6_k.gguf q6_k 135.00 MB
llama-160m-chat-v1.q8_0.gguf q8_0 174.33 MB

Original Model Card:

A Llama Chat Model of 160M Parameters

Recommended Prompt Format

The recommended prompt format is as follows:

<|im_start|>system
{system_message}<|im_end|>
<|im_start|>user
{user_message}<|im_end|>
<|im_start|>assistant

Recommended Inference Parameters

To get the best results, prefer using contrastive search for inference:

penalty_alpha: 0.5
top_k: 5
Downloads last month
33
GGUF
Model size
162M params
Architecture
llama

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

Inference Examples
Inference API (serverless) has been turned off for this model.

Model tree for afrideva/Llama-160M-Chat-v1-GGUF

Quantized
(5)
this model

Datasets used to train afrideva/Llama-160M-Chat-v1-GGUF