Text Generation
Safetensors
English
rwkv
rwkv7
custom_code
File size: 2,331 Bytes
ce348d1
67d12d4
 
1733e53
 
 
 
67d12d4
1733e53
 
 
67d12d4
ce348d1
 
f1a8e95
ce348d1
 
 
 
 
 
 
 
 
 
 
 
48e8e4f
519300a
f1a8e95
ce348d1
 
5e24f2c
ce348d1
 
04a0812
ce348d1
 
 
 
67d12d4
ce348d1
 
 
 
 
e24f29a
ce348d1
1733e53
f1a8e95
e24f29a
ce348d1
 
 
 
 
 
1733e53
ce348d1
f1a8e95
 
ce348d1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8a358ed
ed9e270
 
 
 
 
67d12d4
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
---
base_model:
- BlinkDL/rwkv-7-pile
datasets:
- EleutherAI/the_pile_deduplicated
language:
- en
license: apache-2.0
metrics:
- accuracy
pipeline_tag: text-generation
library_name: rwkv
---

# rwkv7-168M-pile

<!-- Provide a quick summary of what the model is/does. -->

This is RWKV-7 model under flash-linear attention format.

## Model Details


### Model Description

<!-- Provide a longer summary of what this model is. -->

- **Developed by:** Bo Peng, Yu Zhang, Songlin Yang, Ruichong Zhang
- **Funded by:** RWKV Project (Under LF AI & Data Foundation)
- **Model type:** RWKV7
- **Language(s) (NLP):** English
- **License:** Apache-2.0
- **Parameter count:** 168M
- **Tokenizer:** GPT-NeoX 20B tokenizer

### Model Sources

<!-- Provide the basic links for the model. -->

- **Repository:** https://github.com/fla-org/flash-linear-attention ; https://github.com/BlinkDL/RWKV-LM
- **Paper:** https://huggingface.co/papers/2503.14456
- **Weights:** Converted from https://modelscope.cn/models/RWKV/rwkv-7-pile/file/view/master?fileName=RWKV-x070-Pile-168M-20241120-ctx4096.pth

## Uses

<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
Install `flash-linear-attention` and the latest version of `transformers` before using this model:

```bash
pip install git+https://github.com/fla-org/flash-linear-attention
pip install 'transformers>=4.48.0'
```

### Direct Use

<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
You can use this model just as any other HuggingFace models:
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained('fla-hub/rwkv7-168M-pile', trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained('fla-hub/rwkv7-168M-pile', trust_remote_code=True)
```

## Training Details

### Training Data

This model is trained on the Pile with a total of 332 billion tokens.

#### Training Hyperparameters

- **Training regime:** bfloat16, lr 8e-4 to 3e-5 cosine decay, wd 0.1, bsz 8x30x4096

## Evaluation

#### Metrics

`lambada_openai`: ppl 14.2 acc 45.6%

`piqa`: acc 65.5%

## FAQ
Q: safetensors metadata is none.

A: upgrade transformers to >=4.48.0: `pip install 'transformers>=4.48.0'`