--- license: apache-2.0 --- # Mistral-7B-DadJokes-GGUF This is a [Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) model fine-tuned on [Reddit-Dad-Jokes dataset](https://huggingface.co/datasets/shuttie/dadjokes) in a GGUF format. It can generate cringe and toxic jokes on any short prompt: ``` [INST] What horse says when she sees an old lady? [/INST] Move over grandma, I'm going to trot! [INST] What is a lawyer's favorite drink? [/INST] Whine. [INST] My obese pet parrot died over the weekend. [/INST] Now I just have a big fat birdcage. [INST] My wife asked why there was a blanket on top of me and my laptop. [/INST] I told her that I was just covering all my bases. [INST] My girlfriend changed after she became a vegetarian. [/INST] She finally lost her meat. ``` ## Used data We use a [Kaggle Reddit Dad Jokes dataset](https://huggingface.co/datasets/shuttie/dadjokes) formatted in a base+punchline tuples. The model task was to predict the punchline given the base. Prompt format is the same as for original Mistral-7B-0.1 model: `[INST] base [/INST] punchline` ## Fine-tuning process The model was fine-tuned with QLORA using the [LLM_QLORA](https://github.com/georgesung/llm_qlora/) trainer script with the following configuration: ```yaml base_model: mistralai/Mistral-7B-v0.1 model_family: llama # if unspecified will use AutoModelForCausalLM/AutoTokenizer model_context_window: 256 # if unspecified will use tokenizer.model_max_length data: type: dadjoke train: "dadjokes/dataset/train.csv" eval: "dadjokes/dataset/test.csv" lora: r: 8 lora_alpha: 32 target_modules: # modules for which to train lora adapters - q_proj - k_proj - v_proj lora_dropout: 0.05 bias: none task_type: CAUSAL_LM trainer: batch_size: 8 gradient_accumulation_steps: 1 warmup_steps: 100 num_train_epochs: 1 learning_rate: 0.0002 # 2e-4 logging_steps: 20 trainer_output_dir: trainer_outputs/ model_output_dir: models/ ``` Fine-tuning took ~70 minutes on a single RTX 4090. ## Running the model locally This model can be run with a [llama-cpp](https://github.com/ggerganov/llama.cpp) on a CPU using the following command: ``` ./main -n 64 -m models/ggml-model-q4.gguf -p "[INST] My girlfriend changed after she became a vegetarian. [/INST]" system_info: n_threads = 8 / 16 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | sampling: repeat_last_n = 64, repeat_penalty = 1.100000, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.950000, typical_p = 1.000000, temp = 0.800000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000 generate: n_ctx = 512, n_batch = 512, n_predict = 64, n_keep = 0 [INST] My girlfriend changed after she became a vegetarian. [/INST] She finally lost her meat [end of text] llama_print_timings: load time = 439.38 ms llama_print_timings: sample time = 4.62 ms / 6 runs ( 0.77 ms per token, 1298.98 tokens per second) llama_print_timings: prompt eval time = 1786.76 ms / 18 tokens ( 99.26 ms per token, 10.07 tokens per second) llama_print_timings: eval time = 833.66 ms / 5 runs ( 166.73 ms per token, 6.00 tokens per second) llama_print_timings: total time = 2627.55 ms Log end ``` ## License Apache 2.0