Merry's picture
Update README.md
f275f59
|
raw
history blame
2.15 kB
metadata
language:
  - en
tags:
  - ggml
  - text-generation
  - causal-lm
  - rwkv
license: apache-2.0
datasets:
  - EleutherAI/pile
  - togethercomputer/RedPajama-Data-1T

Last updated: 2023-06-07

This is BlinkDL/rwkv-4-pileplus converted to GGML for use with rwkv.cpp and KoboldCpp. rwkv.cpp's conversion instructions were followed.

NOTE: If you're like me and you want to run this model on a 32-bit ARM processor, keep in mind that KoboldCpp/llama.cpp and similar projects don't yet have support for 32-bit ARM as of 2023-07-22. You'll need to compile a 64-bit ARM binary (easiest done through a 64-bit ARM system) and then run it through QEMU user space emulation (slow) or QEMU full system emulation (slower).

Running a 3B model on an emulated x86-64 (on my PC, nonetheless) gave me a speed that felt like a single token every 30 seconds, so the payoff may not be worth it until official support is implemented.

RAM USAGE (KoboldCpp)

Model RAM usage (with OpenBLAS)
Unloaded 41.3 MiB
169M q4_0 232.2 MiB
169M q5_0 243.3 MiB
169M q5_1 249.2 MiB
430M q4_0 413.2 MiB
430M q5_0 454.4 MiB
430M q5_1 471.8 MiB
1.5B q4_0 1.1 GiB
1.5B q5_0 1.3 GiB
1.5B q5_1 1.3 GiB
3B q4_0 2.0 GiB
3B q5_0 2.3 GiB
3B q5_1 2.4 GiB

Original model card by BlinkDL is below.


RWKV-4 PilePlus

Model Description

RWKV-4-pile models finetuning on [RedPajama + some of Pile v2 = 1.7T tokens]. Updated with 2020+2021+2022 data, and better at all European languages.

Although some of these are intermedia checkpoints (XXXGtokens means finetuned for XXXG tokens), you can already use them because I am finetuning from Pile models (instead of retraining).

Note: not instruct tuned yet, and recommended to replace vanilla Pile models.

7B and 14B coming soon.

See https://github.com/BlinkDL/RWKV-LM for details.

Use https://github.com/BlinkDL/ChatRWKV to run it.