|
--- |
|
license: other |
|
license_name: rwkv-decepticon |
|
license_link: LICENSE |
|
datasets: |
|
- roneneldan/TinyStories |
|
language: |
|
- en |
|
--- |
|
|
|
# Dataset |
|
This model was trained using the TinyStories dataset, specifically with the GPT-4 version. |
|
|
|
# The Model |
|
The name "Decepticon" stems from the model's unique architecture, which combines elements of both Transformer and RNN architechtures. This fusion creates a deceptive yet beneficial design. |
|
|
|
The model features a context length of 1024, but in theory, it can be extended indefinitely through fine-tuning. |
|
|
|
``` |
|
rwkv-decepticon-char-20m.pth is to be used with vocab.json. This is a character level model. |
|
rwkv-decepticon-70m.pth (coming soon) is to be used with 20B_tokenizer.json. |
|
rwkv-decepticon-170m.pth (coming soon) is trained on a small subset of the SlimPajama dataset (6gb). This also uses the 20B_tokenizer.json file. |
|
``` |
|
|
|
I would like to train a 7B parameter model but lack the compute required. If you would like to sponsor some compute, please contact me. |
|
|
|
|
|
|
|
Thank you to the creators of RWKV who made all of this possible. Their repo is here: https://github.com/BlinkDL/RWKV-LM |