This model is based on LaBSE, uses utf8 for most languages with the exception of Sanskrit which is in IAST. | |
The objective was sentence similarity for information retrieval and bitext alignment tasks. | |
It handles Tibetan, Buddhist Chinese, Sanskrit (IAST), and Pāli (IAST). |