xlnet-large-bahasa-cased

Pretrained XLNET large language model for Malay.

Pretraining Corpus

xlnet-large-bahasa-cased model was pretrained on ~1.4 Billion words. Below is list of data we trained on,

  1. cleaned local texts.
  2. translated The Pile.

Pretraining details

Load Pretrained Model

You can use this model by installing torch or tensorflow and Huggingface library transformers. And you can use it directly by initializing it like this:

from transformers import XLNetModel, XLNetTokenizer

model = XLNetModel.from_pretrained('malay-huggingface/xlnet-large-bahasa-cased')
tokenizer = XLNetTokenizer.from_pretrained(
    'malay-huggingface/xlnet-large-bahasa-cased',
    do_lower_case = False,
)
Downloads last month
3
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.