jono1234
/

RWKV-Decepticon

Model card Files Files and versions Community

jono1234 commited on Oct 4, 2023

Commit

2c06851

•

1 Parent(s): a2ccb6e

Update README.md

Files changed (1) hide show

README.md +10 -0

README.md CHANGED Viewed

@@ -18,8 +18,18 @@ The model features a context length of 1024, but in theory, it can be extended i
 ```
 rwkv-decepticon-char-20m.pth is to be used with vocab.json. This is a character level model.
 rwkv-decepticon-70m.pth (coming soon) is to be used with 20B_tokenizer.json.
 rwkv-decepticon-170m.pth (coming soon) is trained on a small subset of the SlimPajama dataset (6gb). This also uses the 20B_tokenizer.json file.
 ```
 I would like to train a 7B parameter model but lack the compute required. If you would like to sponsor some compute, please contact me.

 ```
 rwkv-decepticon-char-20m.pth is to be used with vocab.json. This is a character level model.
+PARAMS:
+n_layer: 6
+n_embd: 512
+ctx_len: 1024
 rwkv-decepticon-70m.pth (coming soon) is to be used with 20B_tokenizer.json.
+n_layer: 8
+n_embd: 768
+ctx_len: 1024
 rwkv-decepticon-170m.pth (coming soon) is trained on a small subset of the SlimPajama dataset (6gb). This also uses the 20B_tokenizer.json file.
+n_layer: 8
+n_embd: 768
+ctx_len: 1024
 ```
 I would like to train a 7B parameter model but lack the compute required. If you would like to sponsor some compute, please contact me.