shuttie's picture
typo
e86d018
|
raw
history blame
No virus
3.46 kB
---
license: apache-2.0
---
# Mistral-7B-DadJokes-GGUF
This is a [Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) model fine-tuned on [Reddit-Dad-Jokes dataset](https://huggingface.co/datasets/shuttie/dadjokes) in a GGUF format. It can generate cringe and toxic jokes on any short prompt:
```
[INST] What horse says when she sees an old lady? [/INST] Move over grandma, I'm going to trot!
[INST] What is a lawyer's favorite drink? [/INST] Whine.
[INST] My obese pet parrot died over the weekend. [/INST] Now I just have a big fat birdcage.
[INST] My wife asked why there was a blanket on top of me and my laptop. [/INST] I told her that I was just covering all my bases.
[INST] My girlfriend changed after she became a vegetarian. [/INST] She finally lost her meat.
```
## Used data
We use a [Kaggle Reddit Dad Jokes dataset](https://huggingface.co/datasets/shuttie/dadjokes) formatted in a base+punchline tuples. The model task was to predict the punchline given the base. Prompt format is the same as for original Mistral-7B-0.1 model:
`[INST] base [/INST] punchline`
## Fine-tuning process
The model was fine-tuned with QLORA using the [LLM_QLORA](https://github.com/georgesung/llm_qlora/) trainer script with the following configuration:
```yaml
base_model: mistralai/Mistral-7B-v0.1
model_family: llama # if unspecified will use AutoModelForCausalLM/AutoTokenizer
model_context_window: 256 # if unspecified will use tokenizer.model_max_length
data:
type: dadjoke
train: "dadjokes/dataset/train.csv"
eval: "dadjokes/dataset/test.csv"
lora:
r: 8
lora_alpha: 32
target_modules: # modules for which to train lora adapters
- q_proj
- k_proj
- v_proj
lora_dropout: 0.05
bias: none
task_type: CAUSAL_LM
trainer:
batch_size: 8
gradient_accumulation_steps: 1
warmup_steps: 100
num_train_epochs: 1
learning_rate: 0.0002 # 2e-4
logging_steps: 20
trainer_output_dir: trainer_outputs/
model_output_dir: models/
```
Fine-tuning took ~70 minutes on a single RTX 4090.
## Running the model locally
This model can be run with a [llama-cpp](https://github.com/ggerganov/llama.cpp) on a CPU using the following command:
```
./main -n 64 -m models/ggml-model-q4.gguf -p "[INST] My girlfriend changed after she became a vegetarian. [/INST]"
system_info: n_threads = 8 / 16 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 |
sampling: repeat_last_n = 64, repeat_penalty = 1.100000, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.950000, typical_p = 1.000000, temp = 0.800000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000
generate: n_ctx = 512, n_batch = 512, n_predict = 64, n_keep = 0
[INST] My girlfriend changed after she became a vegetarian. [/INST] She finally lost her meat [end of text]
llama_print_timings: load time = 439.38 ms
llama_print_timings: sample time = 4.62 ms / 6 runs ( 0.77 ms per token, 1298.98 tokens per second)
llama_print_timings: prompt eval time = 1786.76 ms / 18 tokens ( 99.26 ms per token, 10.07 tokens per second)
llama_print_timings: eval time = 833.66 ms / 5 runs ( 166.73 ms per token, 6.00 tokens per second)
llama_print_timings: total time = 2627.55 ms
Log end
```
## License
Apache 2.0