File size: 2,331 Bytes
ce348d1 67d12d4 1733e53 67d12d4 1733e53 67d12d4 ce348d1 f1a8e95 ce348d1 48e8e4f 519300a f1a8e95 ce348d1 5e24f2c ce348d1 04a0812 ce348d1 67d12d4 ce348d1 e24f29a ce348d1 1733e53 f1a8e95 e24f29a ce348d1 1733e53 ce348d1 f1a8e95 ce348d1 8a358ed ed9e270 67d12d4 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 |
---
base_model:
- BlinkDL/rwkv-7-pile
datasets:
- EleutherAI/the_pile_deduplicated
language:
- en
license: apache-2.0
metrics:
- accuracy
pipeline_tag: text-generation
library_name: rwkv
---
# rwkv7-168M-pile
<!-- Provide a quick summary of what the model is/does. -->
This is RWKV-7 model under flash-linear attention format.
## Model Details
### Model Description
<!-- Provide a longer summary of what this model is. -->
- **Developed by:** Bo Peng, Yu Zhang, Songlin Yang, Ruichong Zhang
- **Funded by:** RWKV Project (Under LF AI & Data Foundation)
- **Model type:** RWKV7
- **Language(s) (NLP):** English
- **License:** Apache-2.0
- **Parameter count:** 168M
- **Tokenizer:** GPT-NeoX 20B tokenizer
### Model Sources
<!-- Provide the basic links for the model. -->
- **Repository:** https://github.com/fla-org/flash-linear-attention ; https://github.com/BlinkDL/RWKV-LM
- **Paper:** https://huggingface.co/papers/2503.14456
- **Weights:** Converted from https://modelscope.cn/models/RWKV/rwkv-7-pile/file/view/master?fileName=RWKV-x070-Pile-168M-20241120-ctx4096.pth
## Uses
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
Install `flash-linear-attention` and the latest version of `transformers` before using this model:
```bash
pip install git+https://github.com/fla-org/flash-linear-attention
pip install 'transformers>=4.48.0'
```
### Direct Use
<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
You can use this model just as any other HuggingFace models:
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained('fla-hub/rwkv7-168M-pile', trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained('fla-hub/rwkv7-168M-pile', trust_remote_code=True)
```
## Training Details
### Training Data
This model is trained on the Pile with a total of 332 billion tokens.
#### Training Hyperparameters
- **Training regime:** bfloat16, lr 8e-4 to 3e-5 cosine decay, wd 0.1, bsz 8x30x4096
## Evaluation
#### Metrics
`lambada_openai`: ppl 14.2 acc 45.6%
`piqa`: acc 65.5%
## FAQ
Q: safetensors metadata is none.
A: upgrade transformers to >=4.48.0: `pip install 'transformers>=4.48.0'` |