Update README.md

d3942f1 verified 1 day ago

3.97 kB

	---
	license: apache-2.0
	language:
	- en
	- zh
	base_model: allura-org/Bigger-Body-12b
	library_name: transformers
	tags:
	- axolotl
	- roleplay
	- conversational
	- chat
	- llama-cpp
	- gguf-my-repo
	---

	# Triangle104/Bigger-Body-12b-Q4_K_M-GGUF
	This model was converted to GGUF format from [`allura-org/Bigger-Body-12b`](https://huggingface.co/allura-org/Bigger-Body-12b) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
	Refer to the [original model card](https://huggingface.co/allura-org/Bigger-Body-12b) for more details on the model.

	---
	A roleplay-focused pseudo full-finetune of Mistral Nemo Instruct.
	The successor to the Ink series.







	Testimonials






	First impressions (temp 1, min-p .05-.1)


	It passes my silly logic tests (read: me trolling random characters)
	Haven't seen any slop yet
	Writes short and snappy replies
	...yet not too short, like Mahou, and can write longer responses if the context warrants it
	Follows card formatting instructions


	If this holds up to 16K it will be constantly in the hopper alongside
	Mag-Mell for me. I'm biased towards shorter responses with smarts. :)




	- Tofumagate




	tantalizing writing, leagues better then whatever is available online




	- Bowza




	Fun to use, nice swipe variation, gives me lots to RP off of. Rarely, it'll start to loop, but a quick swipe fixes no problem.




	- AliCat







	Dataset




	The Bigger Body (referred to as Ink v2.1, because that's still the
	internal name) mix is absolutely disgusting. It's even more cursed than
	the original Ink mix.


	(Public) Original Datasets



	Fizzarolli/limarp-processed
	Norquinal/OpenCAI - two_users split
	allura-org/Celeste1.x-data-mixture
	mapsila/PIPPA-ShareGPT-formatted-named
	allenai/tulu-3-sft-personas-instruction-following
	readmehay/medical-01-reasoning-SFT-json
	LooksJuicy/ruozhiba
	shibing624/roleplay-zh-sharegpt-gpt4-data
	CausalLM/Retrieval-SFT-Chat
	ToastyPigeon/fujin-filtered-instruct

	Recommended Settings




	Chat template: Mistral v7-tekken (NOT v3-tekken !!!! the main difference is that v7 has specific [SYSTEM_PROMPT] and [/SYSTEM_PROMPT] tags)
	Recommended samplers (not the be-all-end-all, try some on your own!):


	Temp 1.25 / MinP 0.1







	Hyperparams









	General




	Epochs = 2
	LR = 1e-5
	LR Scheduler = Cosine
	Optimizer = Apollo-mini
	Optimizer target modules = all_linear
	Effective batch size = 16
	Weight Decay = 0.01
	Warmup steps = 50
	Total steps = 920







	Credits




	Humongous thanks to the people who created the data. I would credit you all, but that would be cheating ;)
	Big thanks to all Allura members for testing and emotional support ilya /platonic

	---
	## Use with llama.cpp
	Install llama.cpp through brew (works on Mac and Linux)

	```bash
	brew install llama.cpp

	```
	Invoke the llama.cpp server or the CLI.

	### CLI:
	```bash
	llama-cli --hf-repo Triangle104/Bigger-Body-12b-Q4_K_M-GGUF --hf-file bigger-body-12b-q4_k_m.gguf -p "The meaning to life and the universe is"
	```

	### Server:
	```bash
	llama-server --hf-repo Triangle104/Bigger-Body-12b-Q4_K_M-GGUF --hf-file bigger-body-12b-q4_k_m.gguf -c 2048
	```

	Note: You can also use this checkpoint directly through the [usage steps](https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#usage) listed in the Llama.cpp repo as well.

	Step 1: Clone llama.cpp from GitHub.
	```
	git clone https://github.com/ggerganov/llama.cpp
	```

	Step 2: Move into the llama.cpp folder and build it with `LLAMA_CURL=1` flag along with other hardware-specific flags (for ex: LLAMA_CUDA=1 for Nvidia GPUs on Linux).
	```
	cd llama.cpp && LLAMA_CURL=1 make
	```

	Step 3: Run inference through the main binary.
	```
	./llama-cli --hf-repo Triangle104/Bigger-Body-12b-Q4_K_M-GGUF --hf-file bigger-body-12b-q4_k_m.gguf -p "The meaning to life and the universe is"
	```
	or
	```
	./llama-server --hf-repo Triangle104/Bigger-Body-12b-Q4_K_M-GGUF --hf-file bigger-body-12b-q4_k_m.gguf -c 2048
	```