rwkv7-1.5B-world

This is RWKV-7 model under flash-linear attention format.

Model Details

Model Description

  • Developed by: Bo Peng, Yu Zhang, Songlin Yang, Ruichong Zhang
  • Funded by: RWKV Project (Under LF AI & Data Foundation)
  • Model type: RWKV7
  • Language(s) (NLP): English
  • License: Apache-2.0
  • Parameter count: 1.52B
  • Tokenizer: RWKV World tokenizer
  • Vocabulary size: 65,536

Model Sources

Uses

Install flash-linear-attention and the latest version of transformers before using this model:

pip install git+https://github.com/fla-org/flash-linear-attention
pip install 'transformers>=4.48.0'

Direct Use

You can use this model just as any other HuggingFace models:

from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained('fla-hub/rwkv7-1.5B-world', trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained('fla-hub/rwkv7-1.5B-world', trust_remote_code=True)

Training Details

Training Data

This model is trained on the World v3 with a total of 3.119 trillion tokens.

Training Hyperparameters

  • Training regime: bfloat16, lr 4e-4 to 1e-5 "delayed" cosine decay, wd 0.1 (with increasing batch sizes during the middle)
  • Final Loss: 1.9965
  • Token Count: 3.119 trillion

Evaluation

Metrics

lambada_openai:

before conversion: ppl 4.13 acc 69.4%

after conversion: ppl 4.26 acc 68.8%

FAQ

Q: safetensors metadata is none.

A: upgrade transformers to >=4.48.0: pip install 'transformers>=4.48.0'

Downloads last month
723
Safetensors
Model size
1.53B params
Tensor type
F32
Β·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for fla-hub/rwkv7-1.5B-world

Finetuned
(10)
this model

Spaces using fla-hub/rwkv7-1.5B-world 4

Collection including fla-hub/rwkv7-1.5B-world