File size: 1,616 Bytes
ed8ce5c 44baf79 ed8ce5c 44baf79 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 |
---
language: en
tags:
- roberta
license: mit
---
# RoBERTa base model fine-tuned on pronoun fill masking
This is RoBERTa base fine-tuned for fill masking of just pronouns.
The model's purpose is to post process machine translated text where sentence
level translation may not have enough context to correctly deduce the correct
pronoun to use.
This model was trained on 10B tokens of literature (private light novel and book dataset as well as books1 and 20\% of books3 from The Pile).
This model achieves an 88\% top1 accuracy, evaluated with a sliding window of 512 tokens (84\% without a sliding window).
### How to use
Mask *all* pronoun tokens. The use the fill mask pipeline to get the
model's predictions.
```python
PRONOUN_TOKENS = {
'I', 'ĠI',
'you', 'You', 'Ġyou', 'ĠYou',
'he', 'He', 'Ġhe', 'ĠHe',
'she', 'She', 'Ġshe', 'ĠShe',
'it', 'It', 'Ġit', 'ĠIt',
'we', 'We', 'Ġwe', 'ĠWe',
'they', 'They', 'Ġthey', 'ĠThey',
'my', 'My', 'Ġmy', 'ĠMy',
'your', 'Your', 'Ġyour', 'ĠYour',
'his', 'His', 'Ġhis', 'ĠHis',
'her', 'Her', 'Ġher', 'ĠHer',
'its', 'Its', 'Ġits', 'ĠIts',
'our', 'Our', 'Ġour', 'ĠOur',
'their', 'Their', 'Ġtheir', 'ĠTheir',
'mine', 'Mine', 'Ġmine', 'ĠMine',
'yours', 'Yours', 'Ġyours', 'ĠYours',
'hers', 'Hers', 'Ġhers', 'ĠHers',
'ours', 'Ours', 'Ġours', 'ĠOurs',
'theirs', 'Theirs', 'Ġtheirs', 'ĠTheirs',
}
```
|