long texts are not labelled to the end
#5
by
valentinbeuze
- opened
If I copy and paste your default text ten times ("Apple est créée le 1er avril..."), something is wrong
The last paragraphs are not labelled
Any idea? Is it related to a prefixed maximum number of words for inference?
Do I have to cut my text into blocks to use your model?
Thanks
Hello Valentin,
There is indeed a predefined maximum number of tokens in each model. For camembert models this is around 500 tokens. This means that depending on how many tokens each word will be split, you will be limited to a certain number of words (I would guess probably around 100/200 words).
You can find models which handle more tokens but there will always be a limit.
So yes I would recommend to split your text before.
Thanks,
Jean-Baptiste
Jean-Baptiste
changed discussion status to
closed