|
--- |
|
license: apache-2.0 |
|
--- |
|
|
|
# Mistral-7B-DadJokes-GGUF |
|
|
|
This is a [Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) model fine-tuned on [Reddit-Dad-Jokes dataset](https://huggingface.co/datasets/shuttie/dadjokes) in a GGUF format. It can generate cringe and toxic jokes on any short prompt: |
|
|
|
``` |
|
[INST] What horse says when she sees an old lady? [/INST] Move over grandma, I'm going to trot! |
|
[INST] What is a lawyer's favorite drink? [/INST] Whine. |
|
[INST] My obese pet parrot died over the weekend. [/INST] Now I just have a big fat birdcage. |
|
[INST] My wife asked why there was a blanket on top of me and my laptop. [/INST] I told her that I was just covering all my bases. |
|
[INST] My girlfriend changed after she became a vegetarian. [/INST] She finally lost her meat. |
|
``` |
|
|
|
## Used data |
|
|
|
We use a [Kaggle Reddit Dad Jokes dataset](https://huggingface.co/datasets/shuttie/dadjokes) formatted in a base+punchline tuples. The model task was to predict the punchline given the base. Prompt format is the same as for original Mistral-7B-0.1 model: |
|
|
|
`[INST] base [/INST] punchline` |
|
|
|
## Fine-tuning process |
|
|
|
The model was fine-tuned with QLORA using the [LLM_QLORA](https://github.com/georgesung/llm_qlora/) trainer script with the following configuration: |
|
```yaml |
|
base_model: mistralai/Mistral-7B-v0.1 |
|
model_family: llama # if unspecified will use AutoModelForCausalLM/AutoTokenizer |
|
model_context_window: 256 # if unspecified will use tokenizer.model_max_length |
|
data: |
|
type: dadjoke |
|
train: "dadjokes/dataset/train.csv" |
|
eval: "dadjokes/dataset/test.csv" |
|
lora: |
|
r: 8 |
|
lora_alpha: 32 |
|
target_modules: # modules for which to train lora adapters |
|
- q_proj |
|
- k_proj |
|
- v_proj |
|
lora_dropout: 0.05 |
|
bias: none |
|
task_type: CAUSAL_LM |
|
trainer: |
|
batch_size: 8 |
|
gradient_accumulation_steps: 1 |
|
warmup_steps: 100 |
|
num_train_epochs: 1 |
|
learning_rate: 0.0002 # 2e-4 |
|
logging_steps: 20 |
|
trainer_output_dir: trainer_outputs/ |
|
model_output_dir: models/ |
|
``` |
|
|
|
Fine-tuning took ~70 minutes on a single RTX 4090. |
|
|
|
## Running the model locally |
|
|
|
This model can be run with a [llama-cpp](https://github.com/ggerganov/llama.cpp) on a CPU using the following command: |
|
|
|
``` |
|
./main -n 64 -m models/ggml-model-q4.gguf -p "[INST] My girlfriend changed after she became a vegetarian. [/INST]" |
|
|
|
system_info: n_threads = 8 / 16 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | |
|
sampling: repeat_last_n = 64, repeat_penalty = 1.100000, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.950000, typical_p = 1.000000, temp = 0.800000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000 |
|
generate: n_ctx = 512, n_batch = 512, n_predict = 64, n_keep = 0 |
|
|
|
|
|
[INST] My girlfriend changed after she became a vegetarian. [/INST] She finally lost her meat [end of text] |
|
|
|
llama_print_timings: load time = 439.38 ms |
|
llama_print_timings: sample time = 4.62 ms / 6 runs ( 0.77 ms per token, 1298.98 tokens per second) |
|
llama_print_timings: prompt eval time = 1786.76 ms / 18 tokens ( 99.26 ms per token, 10.07 tokens per second) |
|
llama_print_timings: eval time = 833.66 ms / 5 runs ( 166.73 ms per token, 6.00 tokens per second) |
|
llama_print_timings: total time = 2627.55 ms |
|
Log end |
|
|
|
``` |
|
|
|
## License |
|
|
|
Apache 2.0 |