Update README.md
Browse files
README.md
CHANGED
@@ -18,8 +18,18 @@ The model features a context length of 1024, but in theory, it can be extended i
|
|
18 |
|
19 |
```
|
20 |
rwkv-decepticon-char-20m.pth is to be used with vocab.json. This is a character level model.
|
|
|
|
|
|
|
|
|
21 |
rwkv-decepticon-70m.pth (coming soon) is to be used with 20B_tokenizer.json.
|
|
|
|
|
|
|
22 |
rwkv-decepticon-170m.pth (coming soon) is trained on a small subset of the SlimPajama dataset (6gb). This also uses the 20B_tokenizer.json file.
|
|
|
|
|
|
|
23 |
```
|
24 |
|
25 |
I would like to train a 7B parameter model but lack the compute required. If you would like to sponsor some compute, please contact me.
|
|
|
18 |
|
19 |
```
|
20 |
rwkv-decepticon-char-20m.pth is to be used with vocab.json. This is a character level model.
|
21 |
+
PARAMS:
|
22 |
+
n_layer: 6
|
23 |
+
n_embd: 512
|
24 |
+
ctx_len: 1024
|
25 |
rwkv-decepticon-70m.pth (coming soon) is to be used with 20B_tokenizer.json.
|
26 |
+
n_layer: 8
|
27 |
+
n_embd: 768
|
28 |
+
ctx_len: 1024
|
29 |
rwkv-decepticon-170m.pth (coming soon) is trained on a small subset of the SlimPajama dataset (6gb). This also uses the 20B_tokenizer.json file.
|
30 |
+
n_layer: 8
|
31 |
+
n_embd: 768
|
32 |
+
ctx_len: 1024
|
33 |
```
|
34 |
|
35 |
I would like to train a 7B parameter model but lack the compute required. If you would like to sponsor some compute, please contact me.
|