asedmammad's picture
Create README.md
81ac2a3
|
raw
history blame
2.18 kB
metadata
inference: false

Ejafa's Vicuna Vanilla 1.1 7B GGML

These files are GGML format model files for Ejafa's Vicuna Vanilla 1.1 7B.

GGML files are for CPU + GPU inference using llama.cpp and libraries and UIs which support this format, such as:

How to run in llama.cpp

I use the following command line; adjust for your tastes and needs:

./main -t 8 -ngl 32 -m vicuna_7B_vanilla_1.1.ggmlv3.q5_0.bin --color -c 2048 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "prompt goes here"

Change -t 8 to the number of physical CPU cores you have. For example if your system has 8 cores/16 threads, use -t 8.

Change -ngl 32 to the number of layers to offload to GPU. Remove it if you don't have GPU acceleration.

If you want to have a chat-style conversation, replace the -p <PROMPT> argument with -i -ins

Compatibility

I have uploded bothe the original llama.cpp quant methods (q4_0, q4_1, q5_0, q5_1, q8_0) as well as the new k-quant methods (q2_K, q3_K_S, q3_K_M, q3_K_L, q4_K_S, q4_K_M, q5_K_S, q6_K).

Please refer to llama.cpp and TheBloke's GGML models for further explanation.

How to run in text-generation-webui

Further instructions here: text-generation-webui/docs/llama.cpp-models.md.

Thanks

Thanks to the TheBloke for inspiring and providing almost all of the readme here!

Thanks to the Ejafa for providing checkpoints of the model.

Thanks to the Georgi Gerganov and all of the awesome people in the AI community.