TamiLM Hebrew Nano

A Modern Hebrew specialized LLM based on the RWKVv6 Architecture
Trained only on Modern Hebrew datasets, with a custom vocabulary optimized for Modern Hebrew

Trained at Tel Aviv Makers Hackerspace

Params

Layers 12
Depth 512
Head size 64
Train ctx_len 512
Train tokens 6,841,411,389 (6 Billion)
Vocab size 65536

Train Compute

All compute was performed on a single Nvidia P40 card
Experiments: 62 hours 52 Minutes (2.6 days)
Training run: 208 hours 10 Minutes (8.6 days)

How to run

Load model into your favourite RWKV v6 runtime
Swap RWKV tokenizer with Huggingface tokenizers
Load vocab json from this repo

Wissotsky
/

TamiLM-Hebrew-Nano

TamiLM Hebrew Nano

Params

Train Compute

How to run

Datasets used to train Wissotsky/TamiLM-Hebrew-Nano