This is a roberta based configuration model for Bodo. It does not contain checkpoints for pretrained model. Its has only two things

  • Byte Level BPE Tokenizer for Bodo
  • Roberta base configuration

Uses

You can use tokenizer as following

from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained('alayaran/bodo-roberta-base')

t = tokenizer('कौटि नख'राव दैनि कानेक्सन होबाय')

# {'input_ids': [310, 294, 313, 267, 503, 11, 268, 263, 277, 298, 287, 265, 267, 321, 263, 265, 272, 310, 273, 378, 295, 266, 271, 263, 269], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]}
# to decode

tokenizer.decode(t['input_ids'],skip_special_tokens=True)

# "कौटि नख'राव दैनि कानेक्सन होबाय"
Downloads last month
2
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.