File size: 6,373 Bytes
07ad002
 
 
e046e74
07ad002
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0bae5f7
 
 
 
 
 
 
 
07ad002
f939989
 
 
 
 
 
 
 
 
0bae5f7
f939989
0bae5f7
f939989
 
0bae5f7
f939989
 
0bae5f7
 
 
 
 
07ad002
 
0bae5f7
07ad002
 
0bae5f7
f939989
0bae5f7
f939989
07ad002
0bae5f7
 
 
 
 
 
 
f939989
 
0bae5f7
 
 
f939989
07ad002
f939989
0bae5f7
f939989
 
 
07ad002
f939989
0bae5f7
f939989
 
 
 
 
 
 
07ad002
 
 
 
 
 
 
 
 
 
 
 
6430380
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
07ad002
0bae5f7
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
---
language:
  - en
license: cc-by-nc-4.0
model_name: Octopus-V4-GGUF
base_model: NexaAIDev/Octopus-v4
inference: false
model_creator: NexaAIDev
quantized_by: Nexa AI, Inc.
tags:
  - function calling
  - on-device language model
  - gguf
  - llama cpp
---
# Octopus V4-GGUF: Graph of language models


<p align="center">
- <a href="https://huggingface.co/NexaAIDev/Octopus-v4" target="_blank">Original Model</a>
- <a href="https://www.nexa4ai.com/" target="_blank">Nexa AI Website</a>
- <a href="https://github.com/NexaAI/octopus-v4" target="_blank">Octopus-v4 Github</a>
- <a href="https://arxiv.org/abs/2404.19296" target="_blank">ArXiv</a>
- <a href="https://huggingface.co/spaces/NexaAIDev/domain_llm_leaderboard" target="_blank">Domain LLM Leaderbaord</a>
</p>

<p align="center" width="100%">
  <a><img src="octopus-v4-logo.png" alt="nexa-octopus" style="width: 40%; min-width: 300px; display: block; margin: auto;"></a>
</p>

**Acknowledgement**:  
We sincerely thank our community members, [Mingyuan](https://huggingface.co/ThunderBeee) and [Zoey](https://huggingface.co/ZY6), for their extraordinary contributions to this quantization effort. Please explore [Octopus-v4](https://huggingface.co/NexaAIDev/Octopus-v4) for our original huggingface model.


## Get Started
To run the models, please download them to your local machine using either git clone or [Hugging Face Hub](https://huggingface.co/docs/huggingface_hub/en/guides/download)
```
git clone https://huggingface.co/NexaAIDev/octopus-v4-gguf
```

## Run with [llama.cpp](https://github.com/ggerganov/llama.cpp) (Recommended) 

1. **Clone and compile:**

```bash
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
# Compile the source code:
make
```

2. **Execute the Model:**

Run the following command in the terminal:

```bash
./main -m ./path/to/octopus-v4-Q4_K_M.gguf -n 256 -p "<|system|>You are a router. Below is the query from the users, please call the correct function and generate the parameters to call the function.<|end|><|user|>Tell me the result of derivative of x^3 when x is 2?<|end|><|assistant|>"
```

## Run with [Ollama](https://github.com/ollama/ollama)

Since our models have not been uploaded to the Ollama server, please download the models and manually import them into Ollama by following these steps:

1. Install Ollama on your local machine. You can also following the guide from [Ollama GitHub repository](https://github.com/ollama/ollama/blob/main/docs/import.md)

```bash
git clone https://github.com/ollama/ollama.git ollama
```

2. Locate the local Ollama directory:
```bash
cd ollama
```

3. Create a `Modelfile` in your directory
```bash
touch Modelfile
``` 

4. In the Modelfile, include a `FROM` statement with the path to your local model, and the default parameters:

```bash
FROM ./path/to/octopus-v4-Q4_K_M.gguf
PARAMETER temperature 0
PARAMETER num_ctx 1024
PARAMETER stop <nexa_end>
```

2. Use the following command to add the model to Ollama:

```bash
ollama create octopus-v4-Q4_K_M -f Modelfile
```

3. Verify that the model has been successfully imported:

```bash
ollama ls
```

### Run the model
```bash
ollama run octopus-v4-Q4_K_M "<|system|>You are a router. Below is the query from the users, please call the correct function and generate the parameters to call the function.<|end|><|user|>Tell me the result of derivative of x^3 when x is 2?<|end|><|assistant|>"
```

### Dataset and Benchmark

* Utilized questions from [MMLU](https://github.com/hendrycks/test) to evaluate the performances.
* Evaluated with the Ollama [llm-benchmark](https://github.com/MinhNgyuen/llm-benchmark) method.


## Quantized GGUF Models

| Name                   | Quant method | Bits | Size    | Respons (token/second) | Use Cases                                 |
| ---------------------- | ------------ | ---- | ------- | ---------------------- | ----------------------------------------- |
| Octopus-v4.gguf        |              |      | 7.64 GB | 27.64                  | extremely large                           |
| Octopus-v4-Q2_K.gguf   | Q2_K         | 2    | 1.42 GB | 54.20                  | extremely not recommended, high loss      |
| Octopus-v4-Q3_K.gguf   | Q3_K         | 3    | 1.96 GB | 51.22                  | not recommended                           |
| Octopus-v4-Q3_K_S.gguf | Q3_K_S       | 3    | 1.68 GB | 51.78                  | not very recommended                      |
| Octopus-v4-Q3_K_M.gguf | Q3_K_M       | 3    | 1.96 GB | 50.86                  | not very recommended                      |
| Octopus-v4-Q3_K_L.gguf | Q3_K_L       | 3    | 2.09 GB | 50.05                  | not very recommended                      |
| Octopus-v4-Q4_0.gguf   | Q4_0         | 4    | 2.18 GB | 65.76                  | good quality, recommended                 |
| Octopus-v4-Q4_1.gguf   | Q4_1         | 4    | 2.41 GB | 69.01                  | slow, good quality, recommended           |
| Octopus-v4-Q4_K.gguf   | Q4_K         | 4    | 2.39 GB | 55.76                  | slow, good quality, recommended           |
| Octopus-v4-Q4_K_S.gguf | Q4_K_S       | 4    | 2.19 GB | 53.98                  | high quality, recommended                 |
| Octopus-v4-Q4_K_M.gguf | Q4_K_M       | 4    | 2.39 GB | 58.39                  | some functions loss, not very recommended |
| Octopus-v4-Q5_0.gguf   | Q5_0         | 5    | 2.64 GB | 61.98                  | slow, good quality                        |
| Octopus-v4-Q5_1.gguf   | Q5_1         | 5    | 2.87 GB | 63.44                  | slow, good quality                        |
| Octopus-v4-Q5_K.gguf   | Q5_K         | 5    | 2.82 GB | 58.28                  | moderate speed, recommended               |
| Octopus-v4-Q5_K_S.gguf | Q5_K_S       | 5    | 2.64 GB | 59.95                  | moderate speed, recommended               |
| Octopus-v4-Q5_K_M.gguf | Q5_K_M       | 5    | 2.82 GB | 53.31                  | fast, good quality, recommended           |
| Octopus-v4-Q6_K.gguf   | Q6_K         | 6    | 3.14 GB | 52.15                  | large, not very recommended               |
| Octopus-v4-Q8_0.gguf   | Q8_0         | 8    | 4.06 GB | 50.10                  | very large, good quality                  |
| Octopus-v4-f16.gguf    | f16          | 16   | 7.64 GB | 30.61                  | extremely large                           |

_Quantized with llama.cpp_