shuttie
/

Mistral-7B-DadJokes-GGUF

Inference Endpoints

Model card Files Files and versions Community

Mistral-7B-DadJokes-GGUF / README.md

shuttie's picture

typo

e86d018 12 months ago

|

No virus

3.46 kB

	---
	license: apache-2.0
	---

	# Mistral-7B-DadJokes-GGUF

	This is a [Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) model fine-tuned on [Reddit-Dad-Jokes dataset](https://huggingface.co/datasets/shuttie/dadjokes) in a GGUF format. It can generate cringe and toxic jokes on any short prompt:

	```
	[INST] What horse says when she sees an old lady? [/INST] Move over grandma, I'm going to trot!
	[INST] What is a lawyer's favorite drink? [/INST] Whine.
	[INST] My obese pet parrot died over the weekend. [/INST] Now I just have a big fat birdcage.
	[INST] My wife asked why there was a blanket on top of me and my laptop. [/INST] I told her that I was just covering all my bases.
	[INST] My girlfriend changed after she became a vegetarian. [/INST] She finally lost her meat.
	```

	## Used data

	We use a [Kaggle Reddit Dad Jokes dataset](https://huggingface.co/datasets/shuttie/dadjokes) formatted in a base+punchline tuples. The model task was to predict the punchline given the base. Prompt format is the same as for original Mistral-7B-0.1 model:

	`[INST] base [/INST] punchline`

	## Fine-tuning process

	The model was fine-tuned with QLORA using the [LLM_QLORA](https://github.com/georgesung/llm_qlora/) trainer script with the following configuration:
	```yaml
	base_model: mistralai/Mistral-7B-v0.1
	model_family: llama # if unspecified will use AutoModelForCausalLM/AutoTokenizer
	model_context_window: 256 # if unspecified will use tokenizer.model_max_length
	data:
	type: dadjoke
	train: "dadjokes/dataset/train.csv"
	eval: "dadjokes/dataset/test.csv"
	lora:
	r: 8
	lora_alpha: 32
	target_modules: # modules for which to train lora adapters
	- q_proj
	- k_proj
	- v_proj
	lora_dropout: 0.05
	bias: none
	task_type: CAUSAL_LM
	trainer:
	batch_size: 8
	gradient_accumulation_steps: 1
	warmup_steps: 100
	num_train_epochs: 1
	learning_rate: 0.0002 # 2e-4
	logging_steps: 20
	trainer_output_dir: trainer_outputs/
	model_output_dir: models/
	```

	Fine-tuning took ~70 minutes on a single RTX 4090.

	## Running the model locally

	This model can be run with a [llama-cpp](https://github.com/ggerganov/llama.cpp) on a CPU using the following command:

	```
	./main -n 64 -m models/ggml-model-q4.gguf -p "[INST] My girlfriend changed after she became a vegetarian. [/INST]"

	system_info: n_threads = 8 / 16 \| AVX = 1 \| AVX2 = 1 \| AVX512 = 0 \| AVX512_VBMI = 0 \| AVX512_VNNI = 0 \| FMA = 1 \| NEON = 0 \| ARM_FMA = 0 \| F16C = 1 \| FP16_VA = 0 \| WASM_SIMD = 0 \| BLAS = 0 \| SSE3 = 1 \| SSSE3 = 1 \| VSX = 0 \|
	sampling: repeat_last_n = 64, repeat_penalty = 1.100000, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.950000, typical_p = 1.000000, temp = 0.800000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000
	generate: n_ctx = 512, n_batch = 512, n_predict = 64, n_keep = 0


	[INST] My girlfriend changed after she became a vegetarian. [/INST] She finally lost her meat [end of text]

	llama_print_timings: load time = 439.38 ms
	llama_print_timings: sample time = 4.62 ms / 6 runs ( 0.77 ms per token, 1298.98 tokens per second)
	llama_print_timings: prompt eval time = 1786.76 ms / 18 tokens ( 99.26 ms per token, 10.07 tokens per second)
	llama_print_timings: eval time = 833.66 ms / 5 runs ( 166.73 ms per token, 6.00 tokens per second)
	llama_print_timings: total time = 2627.55 ms
	Log end

	```

	## License

	Apache 2.0