File size: 2,147 Bytes
a2830e5 b1fb133 a2830e5 b1fb133 a2830e5 b1fb133 b903210 b1fb133 817e951 b1fb133 f275f59 17eb111 817e951 17eb111 b1fb133 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 |
---
language:
- en
tags:
- ggml
- text-generation
- causal-lm
- rwkv
license: apache-2.0
datasets:
- EleutherAI/pile
- togethercomputer/RedPajama-Data-1T
---
**Last updated:** 2023-06-07
This is [BlinkDL/rwkv-4-pileplus](https://huggingface.co/BlinkDL/rwkv-4-pileplus) converted to GGML for use with rwkv.cpp and KoboldCpp. [rwkv.cpp's conversion instructions](https://github.com/saharNooby/rwkv.cpp#option-32-convert-and-quantize-pytorch-model) were followed.
**NOTE:** If you're like me and you want to run this model on a 32-bit ARM processor, keep in mind that KoboldCpp/llama.cpp and similar projects don't yet have support for 32-bit ARM as of 2023-07-22. You'll need to compile a 64-bit ARM binary (easiest done through a 64-bit ARM system) and then run it through [QEMU user space emulation](https://www.qemu.org/docs/master/user/main.html) (slow) or [QEMU full system emulation](https://wiki.debian.org/QEMU#Setting_up_a_testing.2Funstable_system) (slower).
Running a 3B model on an emulated x86-64 (on my PC, nonetheless) gave me a speed that felt like a single token every 30 seconds, so the payoff may not be worth it until official support is implemented.
### RAM USAGE (KoboldCpp)
Model | RAM usage (with OpenBLAS)
:--:|:--:
Unloaded | 41.3 MiB
169M q4_0 | 232.2 MiB
169M q5_0 | 243.3 MiB
169M q5_1 | 249.2 MiB
430M q4_0 | 413.2 MiB
430M q5_0 | 454.4 MiB
430M q5_1 | 471.8 MiB
1.5B q4_0 | 1.1 GiB
1.5B q5_0 | 1.3 GiB
1.5B q5_1 | 1.3 GiB
3B q4_0 | 2.0 GiB
3B q5_0 | 2.3 GiB
3B q5_1 | 2.4 GiB
Original model card by BlinkDL is below.
* * *
# RWKV-4 PilePlus
## Model Description
RWKV-4-pile models finetuning on [RedPajama + some of Pile v2 = 1.7T tokens]. Updated with 2020+2021+2022 data, and better at all European languages.
Although some of these are intermedia checkpoints (XXXGtokens means finetuned for XXXG tokens), you can already use them because I am finetuning from Pile models (instead of retraining).
Note: not instruct tuned yet, and recommended to replace vanilla Pile models.
7B and 14B coming soon.
See https://github.com/BlinkDL/RWKV-LM for details.
Use https://github.com/BlinkDL/ChatRWKV to run it. |