English
jono1234 commited on
Commit
2c06851
1 Parent(s): a2ccb6e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -0
README.md CHANGED
@@ -18,8 +18,18 @@ The model features a context length of 1024, but in theory, it can be extended i
18
 
19
  ```
20
  rwkv-decepticon-char-20m.pth is to be used with vocab.json. This is a character level model.
 
 
 
 
21
  rwkv-decepticon-70m.pth (coming soon) is to be used with 20B_tokenizer.json.
 
 
 
22
  rwkv-decepticon-170m.pth (coming soon) is trained on a small subset of the SlimPajama dataset (6gb). This also uses the 20B_tokenizer.json file.
 
 
 
23
  ```
24
 
25
  I would like to train a 7B parameter model but lack the compute required. If you would like to sponsor some compute, please contact me.
 
18
 
19
  ```
20
  rwkv-decepticon-char-20m.pth is to be used with vocab.json. This is a character level model.
21
+ PARAMS:
22
+ n_layer: 6
23
+ n_embd: 512
24
+ ctx_len: 1024
25
  rwkv-decepticon-70m.pth (coming soon) is to be used with 20B_tokenizer.json.
26
+ n_layer: 8
27
+ n_embd: 768
28
+ ctx_len: 1024
29
  rwkv-decepticon-170m.pth (coming soon) is trained on a small subset of the SlimPajama dataset (6gb). This also uses the 20B_tokenizer.json file.
30
+ n_layer: 8
31
+ n_embd: 768
32
+ ctx_len: 1024
33
  ```
34
 
35
  I would like to train a 7B parameter model but lack the compute required. If you would like to sponsor some compute, please contact me.