UNIST-Eunchan commited on
Commit
0e1f401
1 Parent(s): 9c4c56e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -9
README.md CHANGED
@@ -279,7 +279,7 @@ This model is a fine-tuned version of [google/flan-t5-large](https://huggingface
279
 
280
 
281
  ### Load model directly
282
- ```(python)
283
  from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
284
 
285
  tokenizer = AutoTokenizer.from_pretrained("UNIST-Eunchan/FLAN-T5-NLP-Paper-to-Question-Generation")
@@ -287,7 +287,7 @@ model = AutoModelForSeq2SeqLM.from_pretrained("UNIST-Eunchan/FLAN-T5-NLP-Paper-t
287
  ```
288
 
289
  ### Prompting Input
290
- ```(python)
291
  txt = r"""
292
  Generate Question, Answer pair correspond to the following research paper.
293
  [Abstract] + {text['abstract']} + [Introduction] + {text['introduction']}
@@ -298,15 +298,15 @@ inputs = tokenizer(txt, max_length = 1024, truncation=True, padding="max_length"
298
  ```
299
 
300
  ### For Multiple Question Generation (👍)
301
- ```(python)
302
  summaries = model.generate(input_ids =inputs["input_ids"], max_new_tokens=100, do_sample = True, top_p = 0.95, num_return_sequences = 4)
303
  ```
304
  ### For Single Question Generation
305
- ```(python)
306
  summaries = model.generate(input_ids =inputs["input_ids"], max_new_tokens=100, do_sample = True, top_p = 0.95)
307
  ```
308
 
309
- ```
310
  decoded_summaries = [tokenizer.decode(s, skip_special_tokens=False, clean_up_tokenization_spaces=True) for s in summaries]
311
  decoded_summaries = [d.replace("<n>", " ").replace(tokenizer.pad_token, "").replace(tokenizer.eos_token, "") for d in decoded_summaries]
312
 
@@ -316,14 +316,14 @@ decoded_summaries = [d.replace("<n>", " ").replace(tokenizer.pad_token, "").repl
316
  - about 60x faster than (1) [CPU --> COLAB T4 GPU]
317
 
318
  ### Additional Installation
319
- ```(python)
320
  !pip install accelerate -q
321
  !pip install bitsandbytes -q
322
  !pip install optimum -q
323
  ```
324
 
325
  ### Load model directly
326
- ```(python)
327
  import torch
328
  from transformers import AutoTokenizer, AutoModelForSeq2SeqLM,BitsAndBytesConfig
329
  from optimum.bettertransformer import BetterTransformer
@@ -341,7 +341,7 @@ model = BetterTransformer.transform(model)
341
 
342
 
343
  ### For Multiple Question Generation (👍)
344
- ```(python)
345
  # use to(device)
346
  summaries = model.generate(input_ids =inputs["input_ids"].to(device), max_new_tokens=100, do_sample = True, top_p = 0.95, num_return_sequences = 4)
347
  ```
@@ -375,7 +375,7 @@ It achieves the following results on the evaluation set:
375
  ### Generated Output Example
376
  - Our model generate 16 different Q-A Pair with top-p sampling.
377
 
378
- ```(python)
379
  input: r"""
380
  Generate Question, Answer pair correspond to the following research paper.
381
  [Abstract] In this work, we explore prompt tuning, a simple yet effective mechanism for learning soft prompts to condition frozen language models to perform specific downstream tasks. Unlike the discrete text prompts used by GPT-3, soft prompts are learned through backpropagation and can be tuned to incorporate signal from any number of labeled examples. Our end-to-end learned approach outperforms GPT-3's few-shot learning by a large margin. More remarkably, through ablations on model size using T5, we show that prompt tuning becomes more competitive with scale: as models exceed billions of parameters, our method closes the gap and matches the strong performance of model tuning (where all model weights are tuned). This finding is especially relevant in that large models are costly to share and serve, and the ability to reuse one frozen model for multiple downstream tasks can ease this burden. Our method can be seen as a simplification of the recently proposed prefix tuning of Li and Liang (2021), and we provide a comparison to this and other similar approaches. Finally, we show that conditioning a frozen model with soft prompts confers benefits in robustness to domain transfer, as compared to full model tuning. [Introduction] With the wide success of pre-trained large language models, a range of techniques has arisen to adapt these general-purpose models to downstream tasks. ELMo (Peters et al., 2018) proposed freezing the pre-trained model and learning a task-specific weighting of its per-layer representations. However, since GPT (Radford et al., 2018) and BERT (Devlin et al., 2019), the dominant adaptation technique has been model tuning (or fine-tuning), where all model parameters are tuned during adaptation, as proposed by Howard and Ruder (2018).More recently, Brown et al. (2020) showed that prompt design (or priming) is surprisingly effective at modulating a frozen GPT-3 model’s behavior through text prompts. Prompts are typically composed of a task description and/or several canonical examples. This return to freezing pre-trained models is appealing, especially as model size continues to increase. Rather than requiring a separate copy of the model for each downstream task, a single generalist model can simultaneously serve many different tasks. Unfortunately, prompt-based adaptation has several key drawbacks. Task description is error-prone and requires human involvement, and the effectiveness of a prompt is limited by how much conditioning text can fit into the model’s input. As a result, downstream task quality still lags far behind that of tuned models. For instance, GPT-3 175B fewshot performance on SuperGLUE is 17.5 points below fine-tuned T5-XXL (Raffel et al., 2020) (71.8 vs. 89.3) despite using 16 times more parameters. Several efforts to automate prompt design have been recently proposed. Shin et al. (2020) propose a search algorithm over the discrete space of words, guided by the downstream application training data. While this technique outperforms manual prompt design, there is still a gap relative to model tuning. Li and Liang (2021) propose prefix tuning and show strong results on generative tasks. This method freezes the model parameters and backpropagates the error during tuning to prefix activations prepended to each layer in the encoder stack, including the input layer. Hambardzumyan et al. (2021) simplify this recipe by restricting the trainable parameters to the input and output subnetworks of a masked language model, and show reasonable results on classifications tasks. In this paper, we propose prompt tuning as a further simplification for adapting language models. We freeze the entire pre-trained model and only allow an additional k tunable tokens per downstream task to be prepended to the input text. This soft prompt is trained end-to-end and can condense the signal from a full labeled dataset, allowing our method to outperform few-shot prompts and close the quality gap with model tuning (Figure 1). At the same time, since a single pre-trained model is recycled for all downstream tasks, we retain the efficient serving benefits of frozen models (Figure 2). While we developed our method concurrently with Li and Liang (2021) and Hambardzumyan et al. (2021), we are the first to show that prompt tuning alone (with no intermediate-layer prefixes or task-specific output layers) is sufficient to be competitive with model tuning. Through detailed experiments in sections 2–3, we demonstrate that language model capacity is a key ingredient for these approaches to succeed. As Figure 1 shows, prompt tuning becomes more competitive with scale. We compare with similar approaches in Section 4. Explicitly separating task-specific parameters from the generalist parameters needed for general language-understanding has a range of additional benefits. We show in Section 5 that by capturing the task definition in the prompt while keeping the generalist parameters fixed, we are able to achieve better resilience to domain shifts. In Section 6, we show that prompt ensembling, learning multiple prompts for the same task, can boost quality and is more efficient than classic model ensembling. Finally, in Section 7, we investigate the interpretability of our learned soft prompts. In sum, our key contributions are: 1. Proposing prompt tuning and showing its competitiveness with model tuning in the regime of large language models. 2. Ablating many design choices, and showing quality and robustness improve with scale. 3. Showing prompt tuning outperforms model tuning on domain shift problems. 4. Proposing prompt ensembling and showing its effectiveness.
 
279
 
280
 
281
  ### Load model directly
282
+ ```python
283
  from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
284
 
285
  tokenizer = AutoTokenizer.from_pretrained("UNIST-Eunchan/FLAN-T5-NLP-Paper-to-Question-Generation")
 
287
  ```
288
 
289
  ### Prompting Input
290
+ ```python
291
  txt = r"""
292
  Generate Question, Answer pair correspond to the following research paper.
293
  [Abstract] + {text['abstract']} + [Introduction] + {text['introduction']}
 
298
  ```
299
 
300
  ### For Multiple Question Generation (👍)
301
+ ```python
302
  summaries = model.generate(input_ids =inputs["input_ids"], max_new_tokens=100, do_sample = True, top_p = 0.95, num_return_sequences = 4)
303
  ```
304
  ### For Single Question Generation
305
+ ```python
306
  summaries = model.generate(input_ids =inputs["input_ids"], max_new_tokens=100, do_sample = True, top_p = 0.95)
307
  ```
308
 
309
+ ```python
310
  decoded_summaries = [tokenizer.decode(s, skip_special_tokens=False, clean_up_tokenization_spaces=True) for s in summaries]
311
  decoded_summaries = [d.replace("<n>", " ").replace(tokenizer.pad_token, "").replace(tokenizer.eos_token, "") for d in decoded_summaries]
312
 
 
316
  - about 60x faster than (1) [CPU --> COLAB T4 GPU]
317
 
318
  ### Additional Installation
319
+ ```python
320
  !pip install accelerate -q
321
  !pip install bitsandbytes -q
322
  !pip install optimum -q
323
  ```
324
 
325
  ### Load model directly
326
+ ```python
327
  import torch
328
  from transformers import AutoTokenizer, AutoModelForSeq2SeqLM,BitsAndBytesConfig
329
  from optimum.bettertransformer import BetterTransformer
 
341
 
342
 
343
  ### For Multiple Question Generation (👍)
344
+ ```python
345
  # use to(device)
346
  summaries = model.generate(input_ids =inputs["input_ids"].to(device), max_new_tokens=100, do_sample = True, top_p = 0.95, num_return_sequences = 4)
347
  ```
 
375
  ### Generated Output Example
376
  - Our model generate 16 different Q-A Pair with top-p sampling.
377
 
378
+ ```python
379
  input: r"""
380
  Generate Question, Answer pair correspond to the following research paper.
381
  [Abstract] In this work, we explore prompt tuning, a simple yet effective mechanism for learning soft prompts to condition frozen language models to perform specific downstream tasks. Unlike the discrete text prompts used by GPT-3, soft prompts are learned through backpropagation and can be tuned to incorporate signal from any number of labeled examples. Our end-to-end learned approach outperforms GPT-3's few-shot learning by a large margin. More remarkably, through ablations on model size using T5, we show that prompt tuning becomes more competitive with scale: as models exceed billions of parameters, our method closes the gap and matches the strong performance of model tuning (where all model weights are tuned). This finding is especially relevant in that large models are costly to share and serve, and the ability to reuse one frozen model for multiple downstream tasks can ease this burden. Our method can be seen as a simplification of the recently proposed prefix tuning of Li and Liang (2021), and we provide a comparison to this and other similar approaches. Finally, we show that conditioning a frozen model with soft prompts confers benefits in robustness to domain transfer, as compared to full model tuning. [Introduction] With the wide success of pre-trained large language models, a range of techniques has arisen to adapt these general-purpose models to downstream tasks. ELMo (Peters et al., 2018) proposed freezing the pre-trained model and learning a task-specific weighting of its per-layer representations. However, since GPT (Radford et al., 2018) and BERT (Devlin et al., 2019), the dominant adaptation technique has been model tuning (or fine-tuning), where all model parameters are tuned during adaptation, as proposed by Howard and Ruder (2018).More recently, Brown et al. (2020) showed that prompt design (or priming) is surprisingly effective at modulating a frozen GPT-3 model’s behavior through text prompts. Prompts are typically composed of a task description and/or several canonical examples. This return to freezing pre-trained models is appealing, especially as model size continues to increase. Rather than requiring a separate copy of the model for each downstream task, a single generalist model can simultaneously serve many different tasks. Unfortunately, prompt-based adaptation has several key drawbacks. Task description is error-prone and requires human involvement, and the effectiveness of a prompt is limited by how much conditioning text can fit into the model’s input. As a result, downstream task quality still lags far behind that of tuned models. For instance, GPT-3 175B fewshot performance on SuperGLUE is 17.5 points below fine-tuned T5-XXL (Raffel et al., 2020) (71.8 vs. 89.3) despite using 16 times more parameters. Several efforts to automate prompt design have been recently proposed. Shin et al. (2020) propose a search algorithm over the discrete space of words, guided by the downstream application training data. While this technique outperforms manual prompt design, there is still a gap relative to model tuning. Li and Liang (2021) propose prefix tuning and show strong results on generative tasks. This method freezes the model parameters and backpropagates the error during tuning to prefix activations prepended to each layer in the encoder stack, including the input layer. Hambardzumyan et al. (2021) simplify this recipe by restricting the trainable parameters to the input and output subnetworks of a masked language model, and show reasonable results on classifications tasks. In this paper, we propose prompt tuning as a further simplification for adapting language models. We freeze the entire pre-trained model and only allow an additional k tunable tokens per downstream task to be prepended to the input text. This soft prompt is trained end-to-end and can condense the signal from a full labeled dataset, allowing our method to outperform few-shot prompts and close the quality gap with model tuning (Figure 1). At the same time, since a single pre-trained model is recycled for all downstream tasks, we retain the efficient serving benefits of frozen models (Figure 2). While we developed our method concurrently with Li and Liang (2021) and Hambardzumyan et al. (2021), we are the first to show that prompt tuning alone (with no intermediate-layer prefixes or task-specific output layers) is sufficient to be competitive with model tuning. Through detailed experiments in sections 2–3, we demonstrate that language model capacity is a key ingredient for these approaches to succeed. As Figure 1 shows, prompt tuning becomes more competitive with scale. We compare with similar approaches in Section 4. Explicitly separating task-specific parameters from the generalist parameters needed for general language-understanding has a range of additional benefits. We show in Section 5 that by capturing the task definition in the prompt while keeping the generalist parameters fixed, we are able to achieve better resilience to domain shifts. In Section 6, we show that prompt ensembling, learning multiple prompts for the same task, can boost quality and is more efficient than classic model ensembling. Finally, in Section 7, we investigate the interpretability of our learned soft prompts. In sum, our key contributions are: 1. Proposing prompt tuning and showing its competitiveness with model tuning in the regime of large language models. 2. Ablating many design choices, and showing quality and robustness improve with scale. 3. Showing prompt tuning outperforms model tuning on domain shift problems. 4. Proposing prompt ensembling and showing its effectiveness.