Triangle104
/

Ruby-Music-8B-Q5_K_S-GGUF

Inference Endpoints

Model card Files Files and versions Community

Triangle104 commited on Feb 1

Commit

9dea40d

·

verified ·

1 Parent(s): 95833e6

Update README.md

Files changed (1) hide show

README.md +71 -0

README.md CHANGED Viewed

@@ -15,6 +15,77 @@ tags:
 This model was converted to GGUF format from [`ToastyPigeon/Ruby-Music-8B`](https://huggingface.co/ToastyPigeon/Ruby-Music-8B) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
 Refer to the [original model card](https://huggingface.co/ToastyPigeon/Ruby-Music-8B) for more details on the model.
 ## Use with llama.cpp
 Install llama.cpp through brew (works on Mac and Linux)

 This model was converted to GGUF format from [`ToastyPigeon/Ruby-Music-8B`](https://huggingface.co/ToastyPigeon/Ruby-Music-8B) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
 Refer to the [original model card](https://huggingface.co/ToastyPigeon/Ruby-Music-8B) for more details on the model.
+---
+Note that this model is based on InternLM3, not LLaMA 3.
+A roleplaying/creative-writing fine tune of internlm/internlm3-8b-instruct, provided as an alternative to L3 8B for folks with 8GB VRAM.
+This was trained on a mix of private instruct (~1k samples) and
+roleplaying (~2.5k human and ~1k synthetic samples), along with the
+following public datasets:
+allenai/tulu-3-sft-personas-instruction-following (~500 samples)
+PocketDoc/Dans-Prosemaxx-Gutenberg (all samples)
+ToastyPigeon/SpringDragon-Instruct (~500 samples)
+allura-org/fujin-cleaned-stage-2 (~500 samples)
+The instruct format is standard ChatML:
+<|im_start|>system
+{system prompt}<|im_end|>
+<|im_start|>user
+{user message}<|im_end|>
+<|im_start|>assistant
+{assistant response}<|im_end|>
+		Recommended sampler settings:
+temp 1
+smoothing factor 0.5, smoothing curve 1
+DRY 0.5/1.75/5/1024
+There may be better sampler settings, but this at least has proven
+stable in my testing. InternLM3 requires a high amount of tail filtering
+ (high min-p, top-a, or something similar) to avoid making strange typos
+ and spelling mistakes. Note: this might be a current issue with llama.cpp and the GGUF versions I tested.
+		Notes:
+I noticed this model has trouble outputting the EOS token sometimes (despite confirming that <|im_end|>
+ appears at the end of every turn in the training data). This can cause
+it to ramble at the end of a message instead of ending its turn.
+You can either cut the end out of the messages until it picks up the
+right response length, or use logit bias. I've had success getting
+right-sized turns setting logit bias for <|im_end|> to 2.
+---
 ## Use with llama.cpp
 Install llama.cpp through brew (works on Mac and Linux)