[Question] How to keep the model from translating unknow tokens ?

#8
by Fransferdy - opened

For example I have a text, in which I want to preserve person names, sometimes the model will translate John as João for portuguese/spanish, and I would rather keep it as John. Using google translate/bing/ibm watson I'm able to change known names to absurd tokens such as itaquabucetuba555 and they are usually preserved during translation. However when I tried this with the facebook model, it still tries to change the absurd tokens to something else.

Is there a way to prevent the model from changing specific words ?

How about wrapping these specific words in special tokens such as "$$ word $$"

Did you find any robust solution? I tried different placeholders and many regex to catch them back but still I m not satisfied.

Hi Emre,
I remember I used something like 1_1_1_1 for special words but sometimes it didn't worked for example
"I have a gift for my 1_1_1_1"
If 1_1_1_1 stands for "wife", the word "my" will be translated "ma" in French, but if 1_1_1_1 stands for "husband", the word "my" will be translated "mon".
have your tried googletrans?
@EkmekE

Sign up or log in to comment