jono1234
/

RWKV-Decepticon

Model card Files Files and versions Community

RWKV-Decepticon / README.md

jono1234's picture

Update README.md

a2ccb6e over 1 year ago

|

1.13 kB

	---
	license: other
	license_name: rwkv-decepticon
	license_link: LICENSE
	datasets:
	- roneneldan/TinyStories
	language:
	- en
	---

	# Dataset
	This model was trained using the TinyStories dataset, specifically with the GPT-4 version.

	# The Model
	The name "Decepticon" stems from the model's unique architecture, which combines elements of both Transformer and RNN architechtures. This fusion creates a deceptive yet beneficial design.

	The model features a context length of 1024, but in theory, it can be extended indefinitely through fine-tuning.

	```
	rwkv-decepticon-char-20m.pth is to be used with vocab.json. This is a character level model.
	rwkv-decepticon-70m.pth (coming soon) is to be used with 20B_tokenizer.json.
	rwkv-decepticon-170m.pth (coming soon) is trained on a small subset of the SlimPajama dataset (6gb). This also uses the 20B_tokenizer.json file.
	```

	I would like to train a 7B parameter model but lack the compute required. If you would like to sponsor some compute, please contact me.



	Thank you to the creators of RWKV who made all of this possible. Their repo is here: https://github.com/BlinkDL/RWKV-LM