TheBloke
/

Marcoroni-7b-GGUF

Text Generation

Model card Files Files and versions Community

TheBloke commited on Sep 19, 2023

Commit

9d77cac

•

1 Parent(s): 5371a97

Upload README.md

Files changed (1) hide show

README.md +20 -2

README.md CHANGED Viewed

@@ -10,6 +10,18 @@ model_creator: AIDC-ai-business
 model_name: Marcoroni 7b
 model_type: llama
 pipeline_tag: text-generation
 quantized_by: TheBloke
 ---
@@ -61,17 +73,23 @@ Here is an incomplate list of clients and libraries that are known to support GG
 <!-- repositories-available start -->
 ## Repositories available
 * [GPTQ models for GPU inference, with multiple quantisation parameter options.](https://huggingface.co/TheBloke/Marcoroni-7b-GPTQ)
 * [2, 3, 4, 5, 6 and 8-bit GGUF models for CPU+GPU inference](https://huggingface.co/TheBloke/Marcoroni-7b-GGUF)
 * [AIDC-ai-business's original unquantised fp16 model in pytorch format, for GPU inference and for further conversions](https://huggingface.co/AIDC-ai-business/Marcoroni-7b)
 <!-- repositories-available end -->
 <!-- prompt-template start -->
-## Prompt template: Unknown
 ```
 {prompt}
 ```
 <!-- prompt-template end -->
@@ -193,7 +211,7 @@ Windows CLI users: Use `set HUGGINGFACE_HUB_ENABLE_HF_TRANSFER=1` before running
 Make sure you are using `llama.cpp` from commit [d0cee0d36d5be95a0d9088b674dbb27354107221](https://github.com/ggerganov/llama.cpp/commit/d0cee0d36d5be95a0d9088b674dbb27354107221) or later.
 ```shell
-./main -ngl 32 -m marcoroni-7b.q4_K_M.gguf --color -c 4096 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "{prompt}"
 ```
 Change `-ngl 32` to the number of layers to offload to GPU. Remove it if you don't have GPU acceleration.

 model_name: Marcoroni 7b
 model_type: llama
 pipeline_tag: text-generation
+prompt_template: 'Below is an instruction that describes a task. Write a response
+  that appropriately completes the request.
+  ### Instruction:
+  {prompt}
+  ### Response:
+  '
 quantized_by: TheBloke
 ---
 <!-- repositories-available start -->
 ## Repositories available
+* [AWQ model(s) for GPU inference.](https://huggingface.co/TheBloke/Marcoroni-7b-AWQ)
 * [GPTQ models for GPU inference, with multiple quantisation parameter options.](https://huggingface.co/TheBloke/Marcoroni-7b-GPTQ)
 * [2, 3, 4, 5, 6 and 8-bit GGUF models for CPU+GPU inference](https://huggingface.co/TheBloke/Marcoroni-7b-GGUF)
 * [AIDC-ai-business's original unquantised fp16 model in pytorch format, for GPU inference and for further conversions](https://huggingface.co/AIDC-ai-business/Marcoroni-7b)
 <!-- repositories-available end -->
 <!-- prompt-template start -->
+## Prompt template: Alpaca
 ```
+Below is an instruction that describes a task. Write a response that appropriately completes the request.
+### Instruction:
 {prompt}
+### Response:
 ```
 <!-- prompt-template end -->
 Make sure you are using `llama.cpp` from commit [d0cee0d36d5be95a0d9088b674dbb27354107221](https://github.com/ggerganov/llama.cpp/commit/d0cee0d36d5be95a0d9088b674dbb27354107221) or later.
 ```shell
+./main -ngl 32 -m marcoroni-7b.q4_K_M.gguf --color -c 4096 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\n{prompt}\n\n### Response:"
 ```
 Change `-ngl 32` to the number of layers to offload to GPU. Remove it if you don't have GPU acceleration.