metadata

language:
  - en
tags:
  - ggml
  - text-generation
  - causal-lm
  - rwkv
license: apache-2.0
datasets:
  - EleutherAI/pile
  - togethercomputer/RedPajama-Data-1T

Last updated: 2023-06-07

This is BlinkDL/rwkv-4-pileplus converted to GGML for use with rwkv.cpp and KoboldCpp. rwkv.cpp's conversion instructions were followed.

NOTE: If you're like me and you want to run this model on a 32-bit ARM processor, keep in mind that KoboldCpp/llama.cpp and similar projects don't yet have support for 32-bit ARM as of 2023-07-22. You'll need to compile a 64-bit ARM binary (easiest done through a 64-bit ARM system) and then run it through QEMU user space emulation (slow) or QEMU full system emulation (slower).

Running a 3B model on an emulated x86-64 (on my PC, nonetheless) gave me a speed that felt like a single token every 30 seconds, so the payoff may not be worth it until official support is implemented.

RAM USAGE (KoboldCpp)

Model	RAM usage (with OpenBLAS)
Unloaded	41.3 MiB
169M q4_0	232.2 MiB
169M q5_0	243.3 MiB
169M q5_1	249.2 MiB
430M q4_0	413.2 MiB
430M q5_0	454.4 MiB
430M q5_1	471.8 MiB
1.5B q4_0	1.1 GiB
1.5B q5_0	1.3 GiB
1.5B q5_1	1.3 GiB
3B q4_0	2.0 GiB
3B q5_0	2.3 GiB
3B q5_1	2.4 GiB

Original model card by BlinkDL is below.

RWKV-4 PilePlus

Model Description

RWKV-4-pile models finetuning on [RedPajama + some of Pile v2 = 1.7T tokens]. Updated with 2020+2021+2022 data, and better at all European languages.

Although some of these are intermedia checkpoints (XXXGtokens means finetuned for XXXG tokens), you can already use them because I am finetuning from Pile models (instead of retraining).

Note: not instruct tuned yet, and recommended to replace vanilla Pile models.

7B and 14B coming soon.

See https://github.com/BlinkDL/RWKV-LM for details.

Use https://github.com/BlinkDL/ChatRWKV to run it.