|
--- |
|
datasets: |
|
- IteraTeR_full_sent |
|
--- |
|
|
|
# IteraTeR PEGASUS model |
|
This model was obtained by fine-tuning [google/pegasus-large](https://huggingface.co/google/pegasus-large) on [IteraTeR-full-sent](https://huggingface.co/datasets/wanyu/IteraTeR_full_sent) dataset. |
|
|
|
Paper: [Understanding Iterative Revision from Human-Written Text](https://arxiv.org/abs/2203.03802) <br> |
|
Authors: Wanyu Du, Vipul Raheja, Dhruv Kumar, Zae Myung Kim, Melissa Lopez, Dongyeop Kang |
|
|
|
## Text Revision Task |
|
Given an edit intention and an original sentence, our model can generate a revised sentence.<br> |
|
The edit intentions are provided by [IteraTeR-full-sent](https://huggingface.co/datasets/wanyu/IteraTeR_full_sent) dataset, which are categorized as follows: |
|
<table> |
|
<tr> |
|
<th>Edit Intention</th> |
|
<th>Definition</th> |
|
<th>Example</th> |
|
</tr> |
|
<tr> |
|
<td>clarity</td> |
|
<td>Make the text more formal, concise, readable and understandable.</td> |
|
<td> |
|
Original: It's like a house which anyone can enter in it. <br> |
|
Revised: It's like a house which anyone can enter. |
|
</td> |
|
</tr> |
|
<tr> |
|
<td>fluency</td> |
|
<td>Fix grammatical errors in the text.</td> |
|
<td> |
|
Original: In the same year he became the Fellow of the Royal Society. <br> |
|
Revised: In the same year, he became the Fellow of the Royal Society. |
|
</td> |
|
</tr> |
|
<tr> |
|
<td>coherence</td> |
|
<td>Make the text more cohesive, logically linked and consistent as a whole.</td> |
|
<td> |
|
Original: Achievements and awards Among his other activities, he founded the Karachi Film Guild and Pakistan Film and TV Academy. <br> |
|
Revised: Among his other activities, he founded the Karachi Film Guild and Pakistan Film and TV Academy. |
|
</td> |
|
</tr> |
|
<tr> |
|
<td>style</td> |
|
<td>Convey the writer’s writing preferences, including emotions, tone, voice, etc..</td> |
|
<td> |
|
Original: She was last seen on 2005-10-22. <br> |
|
Revised: She was last seen on October 22, 2005. |
|
</td> |
|
</tr> |
|
</table> |
|
|
|
## Usage |
|
```python |
|
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM |
|
|
|
tokenizer = AutoTokenizer.from_pretrained("wanyu/IteraTeR-PEGASUS-Revision-Generator") |
|
model = AutoModelForSeq2SeqLM.from_pretrained("wanyu/IteraTeR-PEGASUS-Revision-Generator") |
|
before_input = '<fluency> I likes coffee.' |
|
model_input = tokenizer(before_input, return_tensors='pt') |
|
model_outputs = model.generate(**model_input, num_beams=8, max_length=1024) |
|
after_text = tokenizer.batch_decode(model_outputs, skip_special_tokens=True)[0] |
|
``` |