[doc] update README
Browse files
README.md
CHANGED
@@ -33,35 +33,23 @@ To deploy the server as an online service, use `--api-keys sk-KEY1 sk-KEY2 ...`
|
|
33 |
curl http://localhost:18888/v1/chat/completions \
|
34 |
-H "Content-Type: application/json" \
|
35 |
-d '{
|
36 |
-
"model": "openchat_v3.
|
37 |
"messages": [{"role": "user", "content": "You are a large language model named OpenChat. Write a poem to describe yourself"}]
|
38 |
}'
|
39 |
```
|
40 |
|
41 |
</details>
|
42 |
|
43 |
-
| Model
|
44 |
-
|
45 |
-
| OpenChat 3.
|
46 |
-
| OpenChat 3.
|
47 |
|
48 |
For inference with Huggingface Transformers (slow and not recommended), follow the conversation template provided below:
|
49 |
|
50 |
<details>
|
51 |
<summary>Conversation templates (click to expand)</summary>
|
52 |
|
53 |
-
V3.1
|
54 |
-
|
55 |
-
```python
|
56 |
-
# Single-turn V3.1
|
57 |
-
tokenize("Assistant is GPT4<|end_of_turn|>User: Hello<|end_of_turn|>Assistant:")
|
58 |
-
# Result: [1, 4007, 22137, 338, 402, 7982, 29946, 32000, 4911, 29901, 15043, 32000, 4007, 22137, 29901]
|
59 |
-
|
60 |
-
# Multi-turn V3.1
|
61 |
-
tokenize("Assistant is GPT4<|end_of_turn|>User: Hello<|end_of_turn|>Assistant: Hi<|end_of_turn|>User: How are you today?<|end_of_turn|>Assistant:")
|
62 |
-
# Result: [1, 4007, 22137, 338, 402, 7982, 29946, 32000, 4911, 29901, 15043, 32000, 4007, 22137, 29901, 6324, 32000, 4911, 29901, 1128, 526, 366, 9826, 29973, 32000, 4007, 22137, 29901]
|
63 |
-
```
|
64 |
-
|
65 |
V3.2
|
66 |
|
67 |
```python
|
@@ -74,6 +62,18 @@ tokenize("GPT4 User: Hello<|end_of_turn|>GPT4 Assistant: Hi<|end_of_turn|>GPT4 U
|
|
74 |
# Result: [1, 402, 7982, 29946, 4911, 29901, 15043, 32000, 402, 7982, 29946, 4007, 22137, 29901, 6324, 32000, 402, 7982, 29946, 4911, 29901, 1128, 526, 366, 9826, 29973, 32000, 402, 7982, 29946, 4007, 22137, 29901]
|
75 |
```
|
76 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
77 |
</details>
|
78 |
|
79 |
## <a id="benchmarks"></a> Benchmarks
|
@@ -82,16 +82,16 @@ We have evaluated our models using the two most popular evaluation benchmarks **
|
|
82 |
|
83 |
To ensure consistency, we used the same routine as ChatGPT / GPT-4 to run these benchmarks. We started the OpenAI API-compatible server and set the `openai.api_base` to `http://localhost:18888/v1` in the benchmark program.
|
84 |
|
85 |
-
| **Model** | **Size** | **Context** | **💲Free** | **AlpacaEval (win rate %)** | **MT-bench (
|
86 |
-
|
87 |
-
| | | |
|
88 |
-
| GPT-4 | 1.8T* | 8K |
|
89 |
-
| ChatGPT | 175B* | 4K | ❌ | 89.4 | 7.94 |
|
90 |
-
| Llama-2-70B-Chat | 70B | 4K | ✅ | 92.7 | 6.86 |
|
91 |
-
| **OpenChat 3.
|
92 |
-
| **OpenChat 3.
|
93 |
-
| Llama-2-13B-Chat | 13B | 4K | ✅ | 81.0 | 6.65 |
|
94 |
-
| Vicuna 1.3 | 13B | 2K | ❌ | 82.1 | 6.00 |
|
95 |
|
96 |
*: Estimated model size
|
97 |
|
|
|
33 |
curl http://localhost:18888/v1/chat/completions \
|
34 |
-H "Content-Type: application/json" \
|
35 |
-d '{
|
36 |
+
"model": "openchat_v3.2",
|
37 |
"messages": [{"role": "user", "content": "You are a large language model named OpenChat. Write a poem to describe yourself"}]
|
38 |
}'
|
39 |
```
|
40 |
|
41 |
</details>
|
42 |
|
43 |
+
| Model | Size | Context | Weights | Serving |
|
44 |
+
|--------------|------|---------|--------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|
45 |
+
| OpenChat 3.2 | 13B | 4096 | [Huggingface](https://huggingface.co/openchat/openchat_v3.2) | `python -m ochat.serving.openai_api_server --model-type openchat_v3.2 --model openchat/openchat_v3.2 --engine-use-ray --worker-use-ray --max-num-batched-tokens 5120` |
|
46 |
+
| OpenChat 3.1 | 13B | 4096 | [Huggingface](https://huggingface.co/openchat/openchat_v3.1) | `python -m ochat.serving.openai_api_server --model-type openchat_v3.1_llama2 --model openchat/openchat_v3.1 --engine-use-ray --worker-use-ray --max-num-batched-tokens 5120` |
|
47 |
|
48 |
For inference with Huggingface Transformers (slow and not recommended), follow the conversation template provided below:
|
49 |
|
50 |
<details>
|
51 |
<summary>Conversation templates (click to expand)</summary>
|
52 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
53 |
V3.2
|
54 |
|
55 |
```python
|
|
|
62 |
# Result: [1, 402, 7982, 29946, 4911, 29901, 15043, 32000, 402, 7982, 29946, 4007, 22137, 29901, 6324, 32000, 402, 7982, 29946, 4911, 29901, 1128, 526, 366, 9826, 29973, 32000, 402, 7982, 29946, 4007, 22137, 29901]
|
63 |
```
|
64 |
|
65 |
+
V3.1
|
66 |
+
|
67 |
+
```python
|
68 |
+
# Single-turn V3.1
|
69 |
+
tokenize("Assistant is GPT4<|end_of_turn|>User: Hello<|end_of_turn|>Assistant:")
|
70 |
+
# Result: [1, 4007, 22137, 338, 402, 7982, 29946, 32000, 4911, 29901, 15043, 32000, 4007, 22137, 29901]
|
71 |
+
|
72 |
+
# Multi-turn V3.1
|
73 |
+
tokenize("Assistant is GPT4<|end_of_turn|>User: Hello<|end_of_turn|>Assistant: Hi<|end_of_turn|>User: How are you today?<|end_of_turn|>Assistant:")
|
74 |
+
# Result: [1, 4007, 22137, 338, 402, 7982, 29946, 32000, 4911, 29901, 15043, 32000, 4007, 22137, 29901, 6324, 32000, 4911, 29901, 1128, 526, 366, 9826, 29973, 32000, 4007, 22137, 29901]
|
75 |
+
```
|
76 |
+
|
77 |
</details>
|
78 |
|
79 |
## <a id="benchmarks"></a> Benchmarks
|
|
|
82 |
|
83 |
To ensure consistency, we used the same routine as ChatGPT / GPT-4 to run these benchmarks. We started the OpenAI API-compatible server and set the `openai.api_base` to `http://localhost:18888/v1` in the benchmark program.
|
84 |
|
85 |
+
| **Model** | **Size** | **Context** | **💲Free** | **AlpacaEval (win rate %)** | **MT-bench (win rate adjusted %)** | **MT-bench (score)** |
|
86 |
+
|------------------|----------|-------------|------------|-----------------------------|------------------------------------|----------------------|
|
87 |
+
| | | | | **v.s. text-davinci-003** | **v.s. ChatGPT** | |
|
88 |
+
| GPT-4 | 1.8T* | 8K | ❌ | 95.3 | 82.5 | 8.99 |
|
89 |
+
| ChatGPT | 175B* | 4K | ❌ | 89.4 | 50.0 | 7.94 |
|
90 |
+
| Llama-2-70B-Chat | 70B | 4K | ✅ | 92.7 | | 6.86 |
|
91 |
+
| **OpenChat 3.2** | **13B** | **4K** | ✅ | **89.1** | **51.6** | **7.01** |
|
92 |
+
| **OpenChat 3.1** | **13B** | **4K** | ✅ | **89.5** | **50.0** | **6.65** |
|
93 |
+
| Llama-2-13B-Chat | 13B | 4K | ✅ | 81.0 | | 6.65 |
|
94 |
+
| Vicuna 1.3 | 13B | 2K | ❌ | 82.1 | 37.5 | 6.00 |
|
95 |
|
96 |
*: Estimated model size
|
97 |
|