TamiLM Hebrew Nano
A Modern Hebrew specialized LLM based on the RWKVv6 Architecture
Trained only on Modern Hebrew datasets, with a custom vocabulary optimized for Modern Hebrew
Trained at Tel Aviv Makers Hackerspace
Params
Layers 12
Depth 512
Head size 64
Train ctx_len 512
Train tokens 6,841,411,389 (6 Billion)
Vocab size 65536
Train Compute
All compute was performed on a single Nvidia P40 card
Experiments: 62 hours 52 Minutes (2.6 days)
Training run: 208 hours 10 Minutes (8.6 days)
How to run
- Load model into your favourite RWKV v6 runtime
- Swap RWKV tokenizer with Huggingface tokenizers
- Load vocab json from this repo
- Downloads last month
- 6