---
license: apache-2.0
---

# Mistral-7B-DadJokes-GGUF

This is a [Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) model fine-tuned on [Reddit-Dad-Jokes dataset](https://huggingface.co/datasets/shuttie/dadjokes) in a GGUF format. It can generate cringe and toxic jokes on any short prompt:

```
[INST] What horse says when she sees an old lady? [/INST] Move over grandma, I'm going to trot!
[INST] What is a lawyer's favorite drink? [/INST] Whine.
[INST] My obese pet parrot died over the weekend. [/INST] Now I just have a big fat birdcage.
[INST] My wife asked why there was a blanket on top of me and my laptop. [/INST] I told her that I was just covering all my bases.
[INST] My girlfriend changed after she became a vegetarian. [/INST] She finally lost her meat.
```

## Used data

We use a [Kaggle Reddit Dad Jokes dataset](https://huggingface.co/datasets/shuttie/dadjokes) formatted in a base+punchline tuples. The model task was to predict the punchline given the base. Prompt format is the same as for original Mistral-7B-0.1 model:

`[INST] base [/INST] punchline`

## Fine-tuning process

The model was fine-tuned with QLORA using the [LLM_QLORA](https://github.com/georgesung/llm_qlora/) trainer script with the following configuration:
```yaml
base_model: mistralai/Mistral-7B-v0.1
model_family: llama  # if unspecified will use AutoModelForCausalLM/AutoTokenizer
model_context_window: 256  # if unspecified will use tokenizer.model_max_length
data:
  type: dadjoke
  train: "dadjokes/dataset/train.csv"
  eval: "dadjokes/dataset/test.csv" 
lora:
  r: 8
  lora_alpha: 32
  target_modules:  # modules for which to train lora adapters
  - q_proj
  - k_proj
  - v_proj
  lora_dropout: 0.05
  bias: none
  task_type: CAUSAL_LM
trainer:
  batch_size: 8
  gradient_accumulation_steps: 1
  warmup_steps: 100
  num_train_epochs: 1
  learning_rate: 0.0002  # 2e-4
  logging_steps: 20
trainer_output_dir: trainer_outputs/
model_output_dir: models/
```

Fine-tuning took ~70 minutes on a single RTX 4090.

## Running the model locally

This model can be run with a [llama-cpp](https://github.com/ggerganov/llama.cpp) on a CPU using the following command:

```
./main -n 64 -m models/ggml-model-q4.gguf -p "[INST] My girlfriend changed after she became a vegetarian. [/INST]"

system_info: n_threads = 8 / 16 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | 
sampling: repeat_last_n = 64, repeat_penalty = 1.100000, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.950000, typical_p = 1.000000, temp = 0.800000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000
generate: n_ctx = 512, n_batch = 512, n_predict = 64, n_keep = 0


 [INST] My girlfriend changed after she became a vegetarian. [/INST] She finally lost her meat [end of text]

llama_print_timings:        load time =   439.38 ms
llama_print_timings:      sample time =     4.62 ms /     6 runs   (    0.77 ms per token,  1298.98 tokens per second)
llama_print_timings: prompt eval time =  1786.76 ms /    18 tokens (   99.26 ms per token,    10.07 tokens per second)
llama_print_timings:        eval time =   833.66 ms /     5 runs   (  166.73 ms per token,     6.00 tokens per second)
llama_print_timings:       total time =  2627.55 ms
Log end

```

## License

Apache 2.0