Answer is always lowercase & adds spaces around special characters
The example on the right of the original model (https://huggingface.co/deutsche-telekom/electra-base-de-squad2) answers with the exact string it finds.
In this model (for transformers.js) I always get the answer in lowercase. This wouldn't be a big issue, if I would get a start and end within the answer (so I could do something like text.slice(answer.start, answer.end). But it seems random when I get the start and end.
I also get more spaces in the string when special characters occur in the context.
text = 'Experte mit Cloud-Plattformen (z. B. AWS, Azure, Google Cloud)\n\nWende dich per Mail an [email protected]\nBeste Grüße\nKlaus-Peter Ulrich van Müller'
question = 'Wer hat den Text geschrieben?'
expected answer = 'Klaus-Peter Ulrich van Müller'
actual answer = 'klaus - peter ulrich van müller'
Am I doing something wrong?
Hi Claudia, thanks for the feedback, I'll check this and get back to you.
could you pease try again? I noticed that the original model already has do_lower_case
set to true
, I did set it to false
and re-created the onnx files,
Thank you very much for your efforts!
I am sorry for the late response. I thought I broke my code, but it's actually the model that cannot be loaded anymore. Here's the stack trace:
Uncaught (in promise) SyntaxError: Unexpected token '<', "<!DOCTYPE "... is not valid JSON
at JSON.parse (<anonymous>)
at getModelJSON (webpack-internal:///./node_modules/@xenova/transformers/src/utils/hub.js:597:17)
at async Promise.all (index 0)
at async loadTokenizer (webpack-internal:///./node_modules/@xenova/transformers/src/tokenizers.js:103:18)
at async AutoTokenizer.from_pretrained (webpack-internal:///./node_modules/@xenova/transformers/src/tokenizers.js:4390:50)
at async Promise.all (index 0)
at async loadItems (webpack-internal:///./node_modules/@xenova/transformers/src/pipelines.js:3107:5)
at async pipeline (webpack-internal:///./node_modules/@xenova/transformers/src/pipelines.js:3047:21)
at async QuestionAnsweringSingleton.getInstance (webpack-internal:///./shared/ai/questionAnswering.ts:20:29)
at async getContactName (webpack-internal:///./shared/ai/questionAnswering.ts:11:22)
When I only exchange the model to e.g. 'Xenova/distilbert-base-cased-distilled-squad' it works as expected.
Thank you for rolling it back. I use a workaround for now:
// workaround for model not giving the exact answer;
// "Karl-Heinz Müller" results in "karl - heinz müller"
const senderAnswerModified = sender.answer.replace(' - ', '-');
const textModified = text.toLowerCase();
const startIndex = textModified.indexOf(senderAnswerModified);
const endIndex = startIndex + sender.answer.length;
const result = text.slice(startIndex, endIndex);
Would love to hear if I can drop this workaround :) THANKS in advance!