|
--- |
|
license: mit |
|
datasets: |
|
- jmhessel/newyorker_caption_contest |
|
language: |
|
- en |
|
tags: |
|
- nyc |
|
- llama2 |
|
widget: |
|
- text: "This scene takes place in the following location: a bank. Three people are standing in line at the bank. The bank teller is a traditional pirate with a hook hand, eye patch, and a parrot. The scene includes: Piracy, Bank teller.\ncaption: Can I interest you in opening an offshore account?\nexplanation of the caption:\n" |
|
example_title: "Training prompt format" |
|
- text: "In this task, you will see a description of an uncanny situation. Then, you will see a joke that was written about the situation. Explain how the joke relates to the situation and why it is funny.\n###\nThis scene takes place in the following location: a bank. Three people are standing in line at the bank. The bank teller is a traditional pirate with a hook hand, eye patch, and a parrot. The scene includes: Piracy, Bank teller.\ncaption: Can I interest you in opening an offshore account?\nexplanation of the caption:\n" |
|
example_title: "Paper prompt format" |
|
- text: "This scene takes place in the following location: a bank. Three people are standing in line at the bank. The bank teller is a traditional pirate with a hook hand, eye patch, and a parrot. The scene includes: Piracy, Bank teller.\ncaption: Can I interest you in opening an offshore account?\nthe caption is funny because" |
|
example_title: "Suggested prompt format" |
|
--- |
|
|
|
# nyrkr-joker-llama |
|
|
|
*New Yorker* cartoon description and caption -> attempt at a joke explanation |
|
|
|
Technical details: |
|
- Based on LLaMa2-7b-hf (version 2, 7B params) |
|
- Used [QLoRA](https://github.com/artidoro/qlora/blob/main/qlora.py) to fine-tune on [1.2k rows of New Yorker caption contest](https://huggingface.co/datasets/jmhessel/newyorker_caption_contest) |
|
- Merged LLaMa2 with the adapter weights (from checkpoint step=160, epoch=2.7) |
|
|
|
## Prompt options |
|
|
|
[The original paper](https://arxiv.org/abs/2209.06293), Figure 10 uses this format for joke explanations: |
|
|
|
`In this task, you will see a description of an uncanny situation. Then, you will see a joke that was written about the situation. Explain how the joke relates to the situation and why it is funny. |
|
### |
|
|
|
{few-shot examples separated by ###, newline after "explanation of the caption:"} |
|
This scene takes place in the following location: a bank. Three people are standing in line at the bank. The bank teller is a traditional pirate with a hook hand, eye patch, and a parrot. The scene includes: Piracy, Bank teller. |
|
caption: Can I interest you in opening an offshore account? |
|
explanation of the caption: |
|
` |
|
|
|
In training, I used just the individual example: |
|
|
|
`This scene takes place in the following location: a bank. Three people are standing in line at the bank. The bank teller is a traditional pirate with a hook hand, eye patch, and a parrot. The scene includes: Piracy, Bank teller. |
|
caption: Can I interest you in opening an offshore account? |
|
explanation of the caption:\n` |
|
|
|
In inference, I had some better results with a more natural prompt (no newline or space at end) |
|
|
|
`This scene takes place in the following location: a bank. Three people are standing in line at the bank. The bank teller is a traditional pirate with a hook hand, eye patch, and a parrot. The scene includes: Piracy, Bank teller. |
|
caption: Can I interest you in opening an offshore account? |
|
the caption is funny because` |
|
|
|
## Training script |
|
|
|
Trained on a V100 |
|
|
|
``` |
|
git clone https://github.com/artidoro/qlora |
|
cd qlora |
|
|
|
pip3 install -r requirements.txt --quiet |
|
|
|
! cd qlora && python qlora.py \ |
|
--model_name_or_path ../llama-2-7b-hf \ |
|
--output_dir ../thatsthejoke \ |
|
--logging_steps 20 \ |
|
--save_strategy steps \ |
|
--data_seed 42 \ |
|
--save_steps 80 \ |
|
--save_total_limit 10 \ |
|
--evaluation_strategy steps \ |
|
--max_new_tokens 64 \ |
|
--dataloader_num_workers 1 \ |
|
--group_by_length \ |
|
--logging_strategy steps \ |
|
--remove_unused_columns False \ |
|
--do_train \ |
|
--lora_r 64 \ |
|
--lora_alpha 16 \ |
|
--lora_modules all \ |
|
--double_quant \ |
|
--quant_type nf4 \ |
|
--bits 4 \ |
|
--warmup_ratio 0.03 \ |
|
--lr_scheduler_type constant \ |
|
--gradient_checkpointing \ |
|
--dataset /content/nycaptions.jsonl \ |
|
--dataset_format 'self-instruct' \ |
|
--source_max_len 16 \ |
|
--target_max_len 512 \ |
|
--per_device_train_batch_size 1 \ |
|
--gradient_accumulation_steps 16 \ |
|
--max_steps 250 \ |
|
--eval_steps 187 \ |
|
--learning_rate 0.0002 \ |
|
--adam_beta2 0.999 \ |
|
--max_grad_norm 0.3 \ |
|
--lora_dropout 0.1 \ |
|
--weight_decay 0.0 \ |
|
--seed 0 |
|
``` |
|
|