Edit model card

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Batyr Tokenizer

         ___
 ,_    .'   `.
| `'--/  o    \
\     |   o    |
 \    \   o    /
  `.   '.____.'
    `-._____

Overview

Batyr Tokenizer is a specialized natural language processing tool designed to tokenize text. Named after Batyr, a term for heroes in Kazakh folklore, this tokenizer is built to handle the unique linguistic features of these languages.

Features

  • Support for Kazakh and other Turkic languages
  • Handles agglutinative morphology
  • Recognizes and preserves culturally significant terms
  • Configurable for different dialects and writing systems

Installation

pip install batyr-tokenizer

Happy tokenizing with Batyr! ๐ŸŽ๐Ÿ“š

Downloads last month
0
Safetensors
Model size
8.03B params
Tensor type
BF16
ยท
Inference API
Unable to determine this model's library. Check the docs .

Collection including CCRss/critical_inst_4_8700