|
--- |
|
license: llama2 |
|
pipeline_tag: text-generation |
|
--- |
|
|
|
These are exl2 quants of [Goliath-longLORA-120b-rope8-32k-fp16](https://huggingface.co/grimulkan/Goliath-longLORA-120b-rope8-32k-fp16) which combines goliath with 32k context. |
|
|
|
I did not create that model, only discovered it and wanted to try it for myself, so I made smaller quants. |
|
|
|
# Available versions |
|
[main](https://huggingface.co/aikitoria/Goliath-longLORA-120b-rope8-32k-exl2/tree/main) has measurements for default dataset and the one for [goliath-120b-exl2-rpcal](https://huggingface.co/Panchovix/goliath-120b-exl2-rpcal) |
|
|
|
[2.65bpw](https://huggingface.co/aikitoria/Goliath-longLORA-120b-rope8-32k-exl2/tree/2.65bpw) using default dataset |
|
[3bpw](https://huggingface.co/aikitoria/Goliath-longLORA-120b-rope8-32k-exl2/tree/3bpw) using default dataset |
|
[4bpw](https://huggingface.co/aikitoria/Goliath-longLORA-120b-rope8-32k-exl2/tree/4bpw) using default dataset |
|
[4.35bpw](https://huggingface.co/aikitoria/Goliath-longLORA-120b-rope8-32k-exl2/tree/4.35bpw) using default dataset |
|
[4.35bpw-rpcal](https://huggingface.co/aikitoria/Goliath-longLORA-120b-rope8-32k-exl2/tree/4.35bpw-rpcal) using PIPPA dataset |
|
|
|
|
|
# Memory usage tests |
|
### 2.65bpw |
|
context 16k, cache 16: 46.9GiB (fits in 2x 3090) |
|
context 32k, cache 8: 47GiB (fits in 2x 3090) |
|
### 3bpw |
|
context 8k, cache 16: 47.4GiB (fits in 2x 3090) |
|
context 16k, cache 8: 47.4GiB (fits in 2x 3090) |
|
### 4.35bpw |
|
context 16k, cache 16: 70.1GiB (fits in 3x 3090) |
|
context 32k, cache 8: 70.3GiB (fits in 3x 3090) |
|
context 32k, cache 16: 78.7GiB (fits in A100 80GB) |
|
|
|
# Super epic scientific test results |
|
- The 2.65bpw version suffered greatly, it's not completely broken, but it's no good either. |
|
- The 3bpw version hasn't suffered as much, it's much more usable than the 2.65bpw one. |
|
- The 4bpw version can be used with CFG since that requires more memory for the context. |
|
- The 4.35bpw version is a bit worse than normal 4k goliath but better than goliath with rope scale applied for 8k+ context. |
|
- The version using the PIPPA dataset produces worse results than the one using the default dataset on any context length. |
|
|
|
My current strategy is to use the original goliath until its context is full and then switch over to this one. |