dreamerdeo
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -61,37 +61,65 @@ Through systematic experiments to determine the weights of different languages,
|
|
61 |
The approach boosts their performance on SEA languages while maintaining proficiency in English and Chinese without significant compromise.
|
62 |
Finally, we continually pre-train the Qwen1.5-0.5B model with 400 Billion tokens, and other models with 200 Billion tokens to obtain the Sailor models.
|
63 |
|
64 |
-
|
65 |
-
The code of Sailor has been in the latest Hugging face transformers and we advise you to install `transformers>=4.37.0`.
|
66 |
|
67 |
-
|
|
|
|
|
|
|
|
|
|
|
68 |
|
69 |
-
|
70 |
-
|
71 |
-
```
|
72 |
-
from transformers import AutoModelForCausalLM, AutoTokenizer
|
73 |
-
device = "cuda" # the device to load the model
|
74 |
|
75 |
-
|
76 |
-
tokenizer = AutoTokenizer.from_pretrained("sail/Sailor-7B")
|
77 |
|
78 |
-
|
79 |
-
### The given Indonesian input translates to 'A language model is a probabilistic model of.'
|
80 |
|
81 |
-
|
|
|
|
|
82 |
|
83 |
-
|
84 |
-
|
85 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
86 |
)
|
87 |
|
88 |
-
|
89 |
-
|
90 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
91 |
|
92 |
-
|
93 |
-
print(response)
|
94 |
```
|
|
|
|
|
|
|
95 |
|
96 |
# License
|
97 |
|
|
|
61 |
The approach boosts their performance on SEA languages while maintaining proficiency in English and Chinese without significant compromise.
|
62 |
Finally, we continually pre-train the Qwen1.5-0.5B model with 400 Billion tokens, and other models with 200 Billion tokens to obtain the Sailor models.
|
63 |
|
64 |
+
### How to run with `llama.cpp`
|
|
|
65 |
|
66 |
+
```shell
|
67 |
+
# install llama.cpp
|
68 |
+
git clone https://github.com/ggerganov/llama.cpp.git
|
69 |
+
cd llama.cpp
|
70 |
+
make
|
71 |
+
pip install -r requirements.txt
|
72 |
|
73 |
+
# generate with llama.cpp
|
74 |
+
./main -ngl 40 -m ggml-model-Q4_K_M.gguf -p "<|im_start|>question\nCara memanggang ikan?\n<|im_start|>answer\n" --temp 0.7 --repeat_penalty 1.1 -n 400 -e
|
75 |
+
```
|
|
|
|
|
76 |
|
77 |
+
> Change `-ngl 32` to the number of layers to offload to GPU. Remove it if you don't have GPU acceleration.
|
|
|
78 |
|
79 |
+
### How to run with `llama-cpp-python`
|
|
|
80 |
|
81 |
+
```shell
|
82 |
+
pip install llama-cpp-python
|
83 |
+
```
|
84 |
|
85 |
+
```python
|
86 |
+
import llama_cpp
|
87 |
+
import llama_cpp.llama_tokenizer
|
88 |
+
|
89 |
+
# load model
|
90 |
+
llama = llama_cpp.Llama.from_pretrained(
|
91 |
+
repo_id="sail/Sailor-4B-Chat-gguf",
|
92 |
+
filename="ggml-model-Q4_K_M.gguf",
|
93 |
+
tokenizer=llama_cpp.llama_tokenizer.LlamaHFTokenizer.from_pretrained("sail/Sailor-4B-Chat"),
|
94 |
+
n_gpu_layers=40,
|
95 |
+
n_threads=8,
|
96 |
+
verbose=False,
|
97 |
)
|
98 |
|
99 |
+
system_role= 'system'
|
100 |
+
user_role = 'question'
|
101 |
+
assistant_role = "answer"
|
102 |
+
|
103 |
+
system_prompt= \
|
104 |
+
'You are an AI assistant named Sailor created by Sea AI Lab. \
|
105 |
+
Your answer should be friendly, unbiased, faithful, informative and detailed.'
|
106 |
+
system_prompt = f"<|im_start|>{system_role}\n{system_prompt}<|im_end|>"
|
107 |
+
|
108 |
+
# inference example
|
109 |
+
output = llama(
|
110 |
+
system_prompt + '\n' + f"<|im_start|>{user_role}\nCara memanggang ikan?\n<|im_start|>{assistant_role}\n",
|
111 |
+
max_tokens=256,
|
112 |
+
temperature=0.7,
|
113 |
+
top_p=0.75,
|
114 |
+
top_k=60,
|
115 |
+
stop=["<|im_end|>", "<|endoftext|>"]
|
116 |
+
)
|
117 |
|
118 |
+
print(output['choices'][0]['text'])
|
|
|
119 |
```
|
120 |
+
### How to build demo
|
121 |
+
|
122 |
+
Install `llama-cpp-python` and `gradio`, then run [script](https://github.com/sail-sg/sailor-llm/blob/main/demo/llamacpp_demo.py).
|
123 |
|
124 |
# License
|
125 |
|