š¢ So far I noticed that š§ reasoning with llm š¤ in English is tend to be more accurate than in other languages. However, besides the GoogleTrans and other open transparent translators, I could not find one that could be easy to use solutions to avoid: 1.š“ Third-party framework installation 2.š“ Text chunking 3.š“ support of meta-annotation like spans / objects / etc.
š To cope problem of IR from non-english texts, I am happy to share the bulk-translate 0.25.0. š
bulk-translate is a tiny Python š no-string framework that allows translate series of texts with the pre-annotated fixed-spans that are invariant for translator.
It supports šØāš» API for quick data translation with (optionaly) annotated objects in texts (see figure below) in Python š I make it accessible as much as possible for RAG and / or LLM-powered app downstreams: š https://github.com/nicolay-r/bulk-translate/wiki
All you have to do is to provide iterator of texts, where each text: 1. ā String object 2. ā List of strings and nested lists that represent spans (value + any ID data).