lucyknada commited on
Commit
6fd6afb
1 Parent(s): c335120

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -2
README.md CHANGED
@@ -38,11 +38,17 @@ Can I ask a question?<|im_end|>
38
 
39
  ## Support
40
 
41
- In order to inference this model you will have to use Aphrodite or vLLM as llama.cpp has not yet merged the required pull request to fix llama3.1 rope_freqs not respecting custom head_dim - You can however get around this by quanting the model yourself with the following fixes for a working GGUF. However, it will be stuck at 8k context until [this PR](https://github.com/ggerganov/llama.cpp/pull/9141) is merged.
42
 
43
- 1. Remove `"rope_scaling": {}` from `config.json`
 
 
 
 
44
  2. Change `"max_position_embeddings"` to `8192` in `config.json`
45
 
 
 
46
  ## Credits
47
 
48
  - [anthracite-org/Stheno-Data-Filtered](https://huggingface.co/datasets/anthracite-org/Stheno-Data-Filtered)
 
38
 
39
  ## Support
40
 
41
+ To run inference on this model, you'll need to use Aphrodite or vLLM, as llama.cpp hasn't yet merged the required pull request to fix the llama3.1 rope_freqs issue with custom head dimensions.
42
 
43
+ However, you can work around this by quantizing the model yourself to create a functional GGUF file. Note that until [this PR](https://github.com/ggerganov/llama.cpp/pull/9141) is merged, the context will be limited to 8k tokens.
44
+
45
+ To create a working GGUF file, make the following adjustments:
46
+
47
+ 1. Remove the `"rope_scaling": {}` entry from `config.json`
48
  2. Change `"max_position_embeddings"` to `8192` in `config.json`
49
 
50
+ These modifications should allow you to use the model with llama.cpp, albeit with the mentioned context limitation.
51
+
52
  ## Credits
53
 
54
  - [anthracite-org/Stheno-Data-Filtered](https://huggingface.co/datasets/anthracite-org/Stheno-Data-Filtered)