deepnight-research
commited on
Commit
•
1a56f20
1
Parent(s):
e332c13
Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,117 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: llama2
|
3 |
+
datasets:
|
4 |
+
- tiiuae/falcon-refinedweb
|
5 |
+
- EleutherAI/pile
|
6 |
+
- meta-math/MetaMathQA
|
7 |
+
language:
|
8 |
+
- en
|
9 |
+
library_name: transformers
|
10 |
+
|
11 |
+
---
|
12 |
+
# Saily 220B
|
13 |
+
<img src="https://i.ibb.co/rG8S6cF/Saily-220-B.png" style="width: 100%; height: auto;"/>
|
14 |
+
|
15 |
+
---
|
16 |
+
## Announcements
|
17 |
+
**1.** <b>Date: </b>17th December, 2023
|
18 |
+
Releasing v1. Saily_220B is a powerful AI model built on top of Llama2-70B merges.
|
19 |
+
We created 10 fine-tuned **Llama2 70B** models. The models were were fine-tuned on a part of Refined-Web Dataset (common for all)
|
20 |
+
and individually the models were finetuned on niche specific datasets:
|
21 |
+
- Code
|
22 |
+
- Humor
|
23 |
+
- Maths
|
24 |
+
- Logical Understanding
|
25 |
+
- Physics
|
26 |
+
- Reasoning
|
27 |
+
- Psychology
|
28 |
+
- Roleplay
|
29 |
+
|
30 |
+
We created 4 linear merges while keeping **Logical-Understanding** and **Reasoning** models constant in all linear merges.
|
31 |
+
and then finally we created a passthrough merge between the models.
|
32 |
+
|
33 |
+
Public Datasets used:
|
34 |
+
1. [RefinedWeb](https://hf.co/datasets/tiiuae/falcon-refinedweb) (part of it)
|
35 |
+
2. Pile (part of it)
|
36 |
+
3. [MetaMathQA](https://hf.co/datasets/meta-math/MetaMathQA)
|
37 |
+
4. Unnatural Code (Javascript, Python, C++)
|
38 |
+
|
39 |
+
### How did we create the private dataset?
|
40 |
+
We recorded many internal brain-storming sessions where we just talked about random things.
|
41 |
+
We also invited many experts from different fields:
|
42 |
+
- Mathematicians
|
43 |
+
- Developers
|
44 |
+
- Bio-Engineers
|
45 |
+
- Authors
|
46 |
+
- Psychologists
|
47 |
+
- and others...
|
48 |
+
|
49 |
+
We talked about different things with them and recorded the sessions and then transcribed the audio to create the datasets.
|
50 |
+
|
51 |
+
---
|
52 |
+
|
53 |
+
### Please don't refer to the config.json in the files, it isn't accurate. You can run:
|
54 |
+
```python
|
55 |
+
from transformers import AutoModelForCausalLM as amclm
|
56 |
+
model = amclm.from_pretrained("deepnight-research/saily_220b",
|
57 |
+
device_map="auto")
|
58 |
+
|
59 |
+
# print(model.config)
|
60 |
+
model.config
|
61 |
+
```
|
62 |
+
to check out the model's configuration.
|
63 |
+
|
64 |
+
---
|
65 |
+
|
66 |
+
|
67 |
+
### Try it:
|
68 |
+
|
69 |
+
You definitely need GPUs here (that goes without saying)
|
70 |
+
* We have tried it on **4 x A100 80GB** and **2 x A100 80GB**.
|
71 |
+
* You will have to load the model in **4bit** to fit on **2 x A100 (80GB)**.
|
72 |
+
|
73 |
+
```python
|
74 |
+
from transformers import AutoModelForCausalLM as amclm
|
75 |
+
from transformers import AutoTokenizer
|
76 |
+
|
77 |
+
model_name = "deepnight-research/saily_220b"
|
78 |
+
model = amclm.from_pretrained(model_name, device_map="auto")
|
79 |
+
|
80 |
+
# To load in 8Bit, make sure you have bitsandbytes installed.
|
81 |
+
# model = amclm.from_pretrained(model_name,
|
82 |
+
# device_map="auto",
|
83 |
+
# load_in_8bit=True
|
84 |
+
# )
|
85 |
+
|
86 |
+
# Float16
|
87 |
+
# import torch
|
88 |
+
# model = amclm.from_pretrained(model_name,
|
89 |
+
# device_map="auto",
|
90 |
+
# torch_dtype=torch.float16
|
91 |
+
# )
|
92 |
+
|
93 |
+
tokenizer = AutoTokenier.from_pretrained(model_name)
|
94 |
+
|
95 |
+
input_ids = tokenizer.encode("[INST]\nWrite a poem about cats\n[/INST]\n\n", return_tensors="pt")
|
96 |
+
|
97 |
+
output = model.generate(input_ids, max_length=128,
|
98 |
+
temperature=0.7,
|
99 |
+
repetition_penalty=1.1,
|
100 |
+
top_p=0.7, top_k=50
|
101 |
+
)
|
102 |
+
|
103 |
+
output_text = tokenizer.decode(output[0], skip_special_tokens=True)
|
104 |
+
```
|
105 |
+
|
106 |
+
We recommend following **Alpaca Prompt Format**, and if you're trying it out in Text-Generation-WebUI, please use **INSTRUCT** or **CHAT-INSTRUCT** mode.
|
107 |
+
|
108 |
+
|
109 |
+
---
|
110 |
+
|
111 |
+
## Limitations and Bias
|
112 |
+
As with all language models, Saily_220B may generate incorrect or biased content. It's important to keep this in mind when using the model.
|
113 |
+
|
114 |
+
---
|
115 |
+
|
116 |
+
## Wanna Talk?
|
117 |
+
Reach out to us at [[email protected]](mailto:[email protected]) or [[email protected]](mailto:[email protected])
|