metadata
license: other
license_name: rwkv-decepticon
license_link: LICENSE
datasets:
- roneneldan/TinyStories
language:
- en
Dataset
This model was trained using the TinyStories dataset, specifically with the GPT-4 version.
The Model
The name "Decepticon" stems from the model's unique architecture, which combines elements of both Transformer and RNN architechtures. This fusion creates a deceptive yet beneficial design.
The model features a context length of 1024, but in theory, it can be extended indefinitely through fine-tuning.
rwkv-decepticon-char-20m.pth is to be used with vocab.json. This is a character level model.
rwkv-decepticon-70m.pth (coming soon) is to be used with 20B_tokenizer.json.
rwkv-decepticon-170m.pth (coming soon) is trained on a small subset of the SlimPajama dataset (6gb). This also uses the 20B_tokenizer.json file.
I would like to train a 7B parameter model but lack the compute required. If you would like to sponsor some compute, please contact me.
Thank you to the creators of RWKV who made all of this possible. Their repo is here: https://github.com/BlinkDL/RWKV-LM