arxiv:2406.17148

Unambiguous Recognition Should Not Rely Solely on Natural Language Training

Published on Jun 24, 2024

Authors:

Abstract

In LaTeX text recognition using Transformer-based architectures, this paper identifies certain "bias" issues. For instance, e-t is frequently misrecognized as e^{-t}. This bias stems from the inherent characteristics of the dataset. To mitigate this bias, we propose a LaTeX printed text recognition model trained on a mixed dataset of pseudo-formulas and pseudo-text. The model employs a Swin Transformer as the encoder and a RoBERTa model as the decoder. Experimental results demonstrate that this approach reduces "bias", enhancing the accuracy and robustness of text recognition. For clear images, the model strictly adheres to the image content; for blurred images, it integrates both image and contextual information to produce reasonable recognition results.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

No model linking this paper

Cite arxiv.org/abs/2406.17148 in a model README.md to link it from this page.

No dataset linking this paper

Cite arxiv.org/abs/2406.17148 in a dataset README.md to link it from this page.

No Space linking this paper

Cite arxiv.org/abs/2406.17148 in a Space README.md to link it from this page.

No Collection including this paper

Add this paper to a collection to link it from this page.