|
--- |
|
license: cc-by-sa-3.0 |
|
extra_gated_prompt: "flant5-dolly-xxl caveats" |
|
extra_gated_fields: |
|
I understand that this model is just a fine-tuned version of flan-t5 on the databricks-dolly-15k dataset: checkbox |
|
I understand that the model is NOT safety-guarded and may generate explicitly biased and/or offensive text: checkbox |
|
I understand that the model outputs are not endorsed by the fine-tuner or the original model/dataset creators or anyone else: checkbox |
|
I understand that this model is inappropriate for direct application in production use-cases and the model trainer gives no guarantees on model quality: checkbox |
|
--- |
|
|
|
This model is [flan-t5-xxl](https://huggingface.co/google/flan-t5-xxl) fine-tuned on [databricks-dolly-15k](https://github.com/databrickslabs/dolly/tree/master/data). |
|
The model was fine-tuned in mid-april 2023 by Jack Hessel, who is a Research Scientist at AI2. Here's a screenshot of a demo I made with it: |
|
|
|
<p align="center"> |
|
<img src="https://huggingface.co/jmhessel/flant5-dolly-xxl/resolve/main/demo_image.png" width=600px> |
|
</p> |
|
|
|
|
|
## fine-tuning details |
|
Fine-tuning occurred using the t5x library on a v3-128 TPU. Input/output sequence lengths were set to 512/256. |
|
~5K fine-tuning steps were taken with an effective batch size of ~320, with model checkpointing every 100 steps. |
|
Adafactor with a learning rate of `.0007` was used. |
|
The checkpoint with the best balance of validation BLEU/loss was selected. |
|
I added very crude support for newline inputs and outputs for t5. Here's how you process the inputs and outputs to get newlines --- my apologies for the crude hack! |
|
|
|
```python |
|
text_in = ' '.join(text_in.replace('\n', ' <_LB_> ').replace('\t', ' <_T_> ').split()) |
|
outputs = model.generate(inputs["input_ids"], max_new_tokens=256, temperature=temp, top_p=top_p, do_sample=True) |
|
out = tokenizer.decode(outputs[0], skip_special_tokens=True, clean_up_tokenization_spaces=False) |
|
out = out.replace(' _LB_> ', '\n').replace(' _T_> ', '\t').replace('_LB_>', '\n').replace('_T_> ', '\t').replace('_LB_>', '\n').replace('_T_>', '\t') |
|
``` |
|
|
|
## why? |
|
|
|
The purpose of fine-tuning the model was to better understand how flan-t5 performs when simply fine-tuned on a human-request-style instruction tuning corpus. |
|
|
|
Observations: |
|
|
|
- it's possible to make more creative requests now, e.g., "Generate a poem about large language models". |
|
- Accuracy is variable: the model regularly and confidently outputs falsehoods. |
|
- the model's outputs are not consistent, e.g., if you generate more than once, you will get contradictory outputs |
|
- chain-of-thought kind of works, prompts like "you are a ethical and honest language model" kind of affect outputs |
|
|
|
## why release? |
|
|
|
While it is fairly easy to get most (un-RLHFed) language models to output offensive text, there's something jarring about playing with a language model where you can make potentially offensive imperative requests. |
|
I hope this model can help folks explore instruction tuned models, and further understand the importance of safeguards. |
|
|