--- license: other license_name: rwkv-decepticon license_link: LICENSE datasets: - roneneldan/TinyStories language: - en --- # Dataset This model was trained using the TinyStories dataset, specifically with the GPT-4 version. # The Model The name "Decepticon" stems from the model's unique architecture, which combines elements of both Transformer and RNN architechtures. This fusion creates a deceptive yet beneficial design. The model features a context length of 1024, but in theory, it can be extended indefinitely through fine-tuning. ``` rwkv-decepticon-char-20m.pth is to be used with vocab.json. This is a character level model. rwkv-decepticon-70m.pth (coming soon) is to be used with 20B_tokenizer.json. rwkv-decepticon-170m.pth (coming soon) is trained on a small subset of the SlimPajama dataset (6gb). This also uses the 20B_tokenizer.json file. ``` I would like to train a 7B parameter model but lack the compute required. If you would like to sponsor some compute, please contact me. Thank you to the creators of RWKV who made all of this possible. Their repo is here: https://github.com/BlinkDL/RWKV-LM