|
--- |
|
license: apache-2.0 |
|
datasets: |
|
- allenai/tulu-3-sft-personas-instruction-following |
|
- PocketDoc/Dans-Prosemaxx-Gutenberg |
|
- ToastyPigeon/SpringDragon-Instruct |
|
- allura-org/fujin-cleaned-stage-2 |
|
base_model: ToastyPigeon/Ruby-Music-8B |
|
tags: |
|
- llama-cpp |
|
- gguf-my-repo |
|
--- |
|
|
|
# Triangle104/Ruby-Music-8B-Q5_K_S-GGUF |
|
This model was converted to GGUF format from [`ToastyPigeon/Ruby-Music-8B`](https://huggingface.co/ToastyPigeon/Ruby-Music-8B) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space. |
|
Refer to the [original model card](https://huggingface.co/ToastyPigeon/Ruby-Music-8B) for more details on the model. |
|
|
|
--- |
|
Note that this model is based on InternLM3, not LLaMA 3. |
|
|
|
|
|
A roleplaying/creative-writing fine tune of internlm/internlm3-8b-instruct, provided as an alternative to L3 8B for folks with 8GB VRAM. |
|
|
|
|
|
This was trained on a mix of private instruct (~1k samples) and |
|
roleplaying (~2.5k human and ~1k synthetic samples), along with the |
|
following public datasets: |
|
|
|
|
|
allenai/tulu-3-sft-personas-instruction-following (~500 samples) |
|
PocketDoc/Dans-Prosemaxx-Gutenberg (all samples) |
|
ToastyPigeon/SpringDragon-Instruct (~500 samples) |
|
allura-org/fujin-cleaned-stage-2 (~500 samples) |
|
|
|
|
|
The instruct format is standard ChatML: |
|
|
|
|
|
<|im_start|>system |
|
{system prompt}<|im_end|> |
|
<|im_start|>user |
|
{user message}<|im_end|> |
|
<|im_start|>assistant |
|
{assistant response}<|im_end|> |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Recommended sampler settings: |
|
|
|
|
|
|
|
|
|
temp 1 |
|
smoothing factor 0.5, smoothing curve 1 |
|
DRY 0.5/1.75/5/1024 |
|
|
|
|
|
There may be better sampler settings, but this at least has proven |
|
stable in my testing. InternLM3 requires a high amount of tail filtering |
|
(high min-p, top-a, or something similar) to avoid making strange typos |
|
and spelling mistakes. Note: this might be a current issue with llama.cpp and the GGUF versions I tested. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Notes: |
|
|
|
|
|
|
|
|
|
I noticed this model has trouble outputting the EOS token sometimes (despite confirming that <|im_end|> |
|
appears at the end of every turn in the training data). This can cause |
|
it to ramble at the end of a message instead of ending its turn. |
|
|
|
|
|
You can either cut the end out of the messages until it picks up the |
|
right response length, or use logit bias. I've had success getting |
|
right-sized turns setting logit bias for <|im_end|> to 2. |
|
|
|
--- |
|
## Use with llama.cpp |
|
Install llama.cpp through brew (works on Mac and Linux) |
|
|
|
```bash |
|
brew install llama.cpp |
|
|
|
``` |
|
Invoke the llama.cpp server or the CLI. |
|
|
|
### CLI: |
|
```bash |
|
llama-cli --hf-repo Triangle104/Ruby-Music-8B-Q5_K_S-GGUF --hf-file ruby-music-8b-q5_k_s.gguf -p "The meaning to life and the universe is" |
|
``` |
|
|
|
### Server: |
|
```bash |
|
llama-server --hf-repo Triangle104/Ruby-Music-8B-Q5_K_S-GGUF --hf-file ruby-music-8b-q5_k_s.gguf -c 2048 |
|
``` |
|
|
|
Note: You can also use this checkpoint directly through the [usage steps](https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#usage) listed in the Llama.cpp repo as well. |
|
|
|
Step 1: Clone llama.cpp from GitHub. |
|
``` |
|
git clone https://github.com/ggerganov/llama.cpp |
|
``` |
|
|
|
Step 2: Move into the llama.cpp folder and build it with `LLAMA_CURL=1` flag along with other hardware-specific flags (for ex: LLAMA_CUDA=1 for Nvidia GPUs on Linux). |
|
``` |
|
cd llama.cpp && LLAMA_CURL=1 make |
|
``` |
|
|
|
Step 3: Run inference through the main binary. |
|
``` |
|
./llama-cli --hf-repo Triangle104/Ruby-Music-8B-Q5_K_S-GGUF --hf-file ruby-music-8b-q5_k_s.gguf -p "The meaning to life and the universe is" |
|
``` |
|
or |
|
``` |
|
./llama-server --hf-repo Triangle104/Ruby-Music-8B-Q5_K_S-GGUF --hf-file ruby-music-8b-q5_k_s.gguf -c 2048 |
|
``` |
|
|