RonanMcGovern
commited on
Commit
·
48c33a5
1
Parent(s):
c3d9785
add link to 13B QPTQ
Browse files
README.md
CHANGED
@@ -2,7 +2,7 @@
|
|
2 |
language:
|
3 |
- en
|
4 |
pipeline_tag: text-generation
|
5 |
-
inference:
|
6 |
tags:
|
7 |
- facebook
|
8 |
- meta
|
@@ -22,7 +22,7 @@ tags:
|
|
22 |
|
23 |
Available models:
|
24 |
- fLlama-7B ([bitsandbytes NF4](https://huggingface.co/Trelis/Llama-2-7b-chat-hf-function-calling)), ([GGML](https://huggingface.co/Trelis/Llama-2-7b-chat-hf-function-calling-GGML)), ([GPTQ](https://huggingface.co/Trelis/Llama-2-7b-chat-hf-function-calling-GPTQ)) - free
|
25 |
-
- fLlama-13B ([bitsandbytes NF4](https://huggingface.co/Trelis/Llama-2-13b-chat-hf-function-calling)) - paid
|
26 |
|
27 |
## Inference with Google Colab and HuggingFace 🤗
|
28 |
|
@@ -41,7 +41,7 @@ To run this you'll need to install llamaccp from ggerganov on github.
|
|
41 |
```
|
42 |
./server -m fLlama-2-7b-chat.ggmlv3.q3_K_M.bin -ngl 32 -c 2048
|
43 |
```
|
44 |
-
|
45 |
|
46 |
## Licensing and Usage
|
47 |
|
|
|
2 |
language:
|
3 |
- en
|
4 |
pipeline_tag: text-generation
|
5 |
+
inference: false
|
6 |
tags:
|
7 |
- facebook
|
8 |
- meta
|
|
|
22 |
|
23 |
Available models:
|
24 |
- fLlama-7B ([bitsandbytes NF4](https://huggingface.co/Trelis/Llama-2-7b-chat-hf-function-calling)), ([GGML](https://huggingface.co/Trelis/Llama-2-7b-chat-hf-function-calling-GGML)), ([GPTQ](https://huggingface.co/Trelis/Llama-2-7b-chat-hf-function-calling-GPTQ)) - free
|
25 |
+
- fLlama-13B ([bitsandbytes NF4](https://huggingface.co/Trelis/Llama-2-13b-chat-hf-function-calling)), ([GPTQ](https://huggingface.co/Trelis/Llama-2-13b-chat-hf-function-calling-GPTQ)) - paid
|
26 |
|
27 |
## Inference with Google Colab and HuggingFace 🤗
|
28 |
|
|
|
41 |
```
|
42 |
./server -m fLlama-2-7b-chat.ggmlv3.q3_K_M.bin -ngl 32 -c 2048
|
43 |
```
|
44 |
+
which will allow you to run a chatbot in your browser. The -ngl offloads layers to the Mac's GPU and gets very good token generation speed.
|
45 |
|
46 |
## Licensing and Usage
|
47 |
|