Edit model card

Palmyra-Fin-70B-32K-IMat-GGUF

Llama.cpp imatrix quantization of Writer/Palmyra-Fin-70B-32K

Original Model: Writer/Palmyra-Fin-70B-32K
Original dtype: BF16 (bfloat16)
Quantized by: llama.cpp b3504
IMatrix dataset: here


Files

IMatrix

Status: โœ… Available
Link: here

Common Quants

Filename Quant type File Size Status Uses IMatrix Is Split
Palmyra-Fin-70B-32K.Q8_0/* Q8_0 74.98GB โœ… Available โšช Static โœ‚ Yes
Palmyra-Fin-70B-32K.Q6_K/* Q6_K 57.89GB โœ… Available โšช Static โœ‚ Yes
Palmyra-Fin-70B-32K.Q4_K.gguf Q4_K 42.52GB โœ… Available ๐ŸŸข IMatrix ๐Ÿ“ฆ No
Palmyra-Fin-70B-32K.Q3_K.gguf Q3_K 34.27GB โœ… Available ๐ŸŸข IMatrix ๐Ÿ“ฆ No
Palmyra-Fin-70B-32K.Q2_K.gguf Q2_K 26.38GB โœ… Available ๐ŸŸข IMatrix ๐Ÿ“ฆ No

All Quants

Filename Quant type File Size Status Uses IMatrix Is Split
Palmyra-Fin-70B-32K.F32/* F32 282.22GB โœ… Available โšช Static โœ‚ Yes
Palmyra-Fin-70B-32K.BF16/* BF16 141.12GB โœ… Available โšช Static โœ‚ Yes
Palmyra-Fin-70B-32K.FP16/* F16 141.12GB โœ… Available โšช Static โœ‚ Yes
Palmyra-Fin-70B-32K.Q8_0/* Q8_0 74.98GB โœ… Available โšช Static โœ‚ Yes
Palmyra-Fin-70B-32K.Q6_K/* Q6_K 57.89GB โœ… Available โšช Static โœ‚ Yes
Palmyra-Fin-70B-32K.Q5_K/* Q5_K 49.95GB โœ… Available โšช Static โœ‚ Yes
Palmyra-Fin-70B-32K.Q5_K_S/* Q5_K_S 48.66GB โœ… Available โšช Static โœ‚ Yes
Palmyra-Fin-70B-32K.Q4_K.gguf Q4_K 42.52GB โœ… Available ๐ŸŸข IMatrix ๐Ÿ“ฆ No
Palmyra-Fin-70B-32K.Q4_K_S.gguf Q4_K_S 40.35GB โœ… Available ๐ŸŸข IMatrix ๐Ÿ“ฆ No
Palmyra-Fin-70B-32K.IQ4_NL.gguf IQ4_NL 40.05GB โœ… Available ๐ŸŸข IMatrix ๐Ÿ“ฆ No
Palmyra-Fin-70B-32K.IQ4_XS.gguf IQ4_XS 37.90GB โœ… Available ๐ŸŸข IMatrix ๐Ÿ“ฆ No
Palmyra-Fin-70B-32K.Q3_K.gguf Q3_K 34.27GB โœ… Available ๐ŸŸข IMatrix ๐Ÿ“ฆ No
Palmyra-Fin-70B-32K.Q3_K_L.gguf Q3_K_L 37.14GB โœ… Available ๐ŸŸข IMatrix ๐Ÿ“ฆ No
Palmyra-Fin-70B-32K.Q3_K_S.gguf Q3_K_S 30.91GB โœ… Available ๐ŸŸข IMatrix ๐Ÿ“ฆ No
Palmyra-Fin-70B-32K.IQ3_M.gguf IQ3_M 31.94GB โœ… Available ๐ŸŸข IMatrix ๐Ÿ“ฆ No
Palmyra-Fin-70B-32K.IQ3_S.gguf IQ3_S 30.91GB โœ… Available ๐ŸŸข IMatrix ๐Ÿ“ฆ No
Palmyra-Fin-70B-32K.IQ3_XS.gguf IQ3_XS 29.31GB โœ… Available ๐ŸŸข IMatrix ๐Ÿ“ฆ No
Palmyra-Fin-70B-32K.IQ3_XXS.gguf IQ3_XXS 27.47GB โœ… Available ๐ŸŸข IMatrix ๐Ÿ“ฆ No
Palmyra-Fin-70B-32K.Q2_K.gguf Q2_K 26.38GB โœ… Available ๐ŸŸข IMatrix ๐Ÿ“ฆ No
Palmyra-Fin-70B-32K.Q2_K_S.gguf Q2_K_S 24.47GB โœ… Available ๐ŸŸข IMatrix ๐Ÿ“ฆ No
Palmyra-Fin-70B-32K.IQ2_M.gguf IQ2_M 24.12GB โœ… Available ๐ŸŸข IMatrix ๐Ÿ“ฆ No
Palmyra-Fin-70B-32K.IQ2_S.gguf IQ2_S 22.24GB โœ… Available ๐ŸŸข IMatrix ๐Ÿ“ฆ No
Palmyra-Fin-70B-32K.IQ2_XS.gguf IQ2_XS 21.14GB โœ… Available ๐ŸŸข IMatrix ๐Ÿ“ฆ No
Palmyra-Fin-70B-32K.IQ2_XXS.gguf IQ2_XXS 19.10GB โœ… Available ๐ŸŸข IMatrix ๐Ÿ“ฆ No
Palmyra-Fin-70B-32K.IQ1_M.gguf IQ1_M 16.75GB โœ… Available ๐ŸŸข IMatrix ๐Ÿ“ฆ No
Palmyra-Fin-70B-32K.IQ1_S.gguf IQ1_S 15.34GB โœ… Available ๐ŸŸข IMatrix ๐Ÿ“ฆ No

Downloading using huggingface-cli

If you do not have hugginface-cli installed:

pip install -U "huggingface_hub[cli]"

Download the specific file you want:

huggingface-cli download legraphista/Palmyra-Fin-70B-32K-IMat-GGUF --include "Palmyra-Fin-70B-32K.Q8_0.gguf" --local-dir ./

If the model file is big, it has been split into multiple files. In order to download them all to a local folder, run:

huggingface-cli download legraphista/Palmyra-Fin-70B-32K-IMat-GGUF --include "Palmyra-Fin-70B-32K.Q8_0/*" --local-dir ./
# see FAQ for merging GGUF's

Inference

Simple chat template

<|begin_of_text|><|start_header_id|>user<|end_header_id|>

{user_prompt}<|eot_id|><|start_header_id|>assistant<|end_header_id|>

{assistant_response}<|eot_id|><|start_header_id|>user<|end_header_id|>

{next_user_prompt}<|eot_id|>

Chat template with system prompt

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

{system_prompt}<|eot_id|><|start_header_id|>user<|end_header_id|>

{user_prompt}<|eot_id|><|start_header_id|>assistant<|end_header_id|>

{assistant_response}<|eot_id|><|start_header_id|>user<|end_header_id|>

{next_user_prompt}<|eot_id|>

Llama.cpp

llama.cpp/main -m Palmyra-Fin-70B-32K.Q8_0.gguf --color -i -p "prompt here (according to the chat template)"

FAQ

Why is the IMatrix not applied everywhere?

According to this investigation, it appears that lower quantizations are the only ones that benefit from the imatrix input (as per hellaswag results).

How do I merge a split GGUF?

  1. Make sure you have gguf-split available
  2. Locate your GGUF chunks folder (ex: Palmyra-Fin-70B-32K.Q8_0)
  3. Run gguf-split --merge Palmyra-Fin-70B-32K.Q8_0/Palmyra-Fin-70B-32K.Q8_0-00001-of-XXXXX.gguf Palmyra-Fin-70B-32K.Q8_0.gguf
    • Make sure to point gguf-split to the first chunk of the split.

Got a suggestion? Ping me @legraphista!

Downloads last month
1,138
GGUF
Model size
70.6B params
Architecture
llama

1-bit

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

32-bit

Inference Examples
Inference API (serverless) has been turned off for this model.

Model tree for legraphista/Palmyra-Fin-70B-32K-IMat-GGUF

Quantized
(4)
this model