thefrigidliquidation
/

roberta-base-pronouns

Inference Endpoints

Model card Files Files and versions Community

roberta-base-pronouns / README.md

thefrigidliquidation's picture

thefrigidliquidation

Add usage example to readme

1b692d5 verified 7 months ago

|

history blame contribute delete

1.73 kB

	---
	language: en
	tags:
	- roberta
	license: mit
	---

	# RoBERTa base model fine-tuned on pronoun fill masking

	This is RoBERTa base fine-tuned for fill masking of just pronouns.
	The model's purpose is to post process machine translated text where sentence
	level translation may not have enough context to correctly deduce the correct
	pronoun to use.

	This model was trained on 10B tokens of literature (private light novel and book dataset as well as books1 and 20\% of books3 from The Pile).

	This model achieves an 88\% top1 accuracy, evaluated with a sliding window of 512 tokens (84\% without a sliding window).

	### How to use

	Use `fix_pronouns_in_text` from [pronoun_fixer.py](https://huggingface.co/thefrigidliquidation/roberta-base-pronouns/blob/main/pronoun_fixer.py)

	```python
	from transformers import AutoModelForMaskedLM, AutoTokenizer, FillMaskPipeline
	import pronoun_fixer


	# text produced by sentence level machine translation where the pronoun was ambiguous in the source language
	# and is wrong in the target language
	MTL_TEXT = """
	Cadence Lee thought he was a normal girl, perhaps a little well to do, but not exceptionally so.
	"""

	device = 'cuda'
	pronoun_checkpoint = "thefrigidliquidation/roberta-base-pronouns"
	pronoun_model = AutoModelForMaskedLM.from_pretrained(pronoun_checkpoint).to(device)
	pronoun_tokenizer = AutoTokenizer.from_pretrained(pronoun_checkpoint)
	unmasker = FillMaskPipeline(model=pronoun_model, tokenizer=pronoun_tokenizer, device=device, top_k=10)

	fixed_text = pronoun_fixer.fix_pronouns_in_text(unmasker, pronoun_tokenizer, MTL_TEXT)

	print(fixed_text)
	# Cadence Lee thought she was a normal girl, perhaps a little well to do, but not exceptionally so.
	# now the pronoun is fixed
	```