Update README.md
Browse files
README.md
CHANGED
@@ -16,6 +16,10 @@ datasets:
|
|
16 |
|
17 |
This is [BlinkDL/rwkv-4-pileplus](https://huggingface.co/BlinkDL/rwkv-4-pileplus) converted to GGML for use with rwkv.cpp and KoboldCpp. [rwkv.cpp's conversion instructions](https://github.com/saharNooby/rwkv.cpp#option-32-convert-and-quantize-pytorch-model) were followed.
|
18 |
|
|
|
|
|
|
|
|
|
19 |
### RAM USAGE (KoboldCpp)
|
20 |
Model | RAM usage (with OpenBLAS)
|
21 |
:--:|:--:
|
|
|
16 |
|
17 |
This is [BlinkDL/rwkv-4-pileplus](https://huggingface.co/BlinkDL/rwkv-4-pileplus) converted to GGML for use with rwkv.cpp and KoboldCpp. [rwkv.cpp's conversion instructions](https://github.com/saharNooby/rwkv.cpp#option-32-convert-and-quantize-pytorch-model) were followed.
|
18 |
|
19 |
+
**NOTE:** If you're like me and you want to run this model on a 32-bit ARM processor, keep in mind that KoboldCpp/llama.cpp and similar projects don't yet have support for 32-bit ARM as of 2023-07-22. You'll need to compile a 64-bit ARM binary (easiest done through a 64-bit ARM system) and then run it through [QEMU user space emulation](https://www.qemu.org/docs/master/user/main.html) (slow) or [QEMU full system emulation](https://wiki.debian.org/QEMU#Setting_up_a_testing.2Funstable_system) (slower).
|
20 |
+
|
21 |
+
Running a 3B model on an emulated x86-64 (on my PC, nonetheless) gave me a speed that felt like a single token every 30 seconds, so the payoff may not be worth it until official support is implemented.
|
22 |
+
|
23 |
### RAM USAGE (KoboldCpp)
|
24 |
Model | RAM usage (with OpenBLAS)
|
25 |
:--:|:--:
|