YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Model Card: banELECTRA-Base

Model Details

The benElectra model is a Bangla adaptation of ELECTRA (Efficiently Learning an Encoder that Classifies Token Replacements Accurately), a pre-training method for language models introduced by researchers at Google.ELECTRA uses a unique training strategy called contrastive learning, which differs from traditional masked language modeling (MLM) methods like BERT.After pre-training, only the discriminator is fine-tuned on downstream tasks, making ELECTRA a more efficient alternative to BERT, achieving higher performance with fewer parameters.
The banELECTRA-Base model is tailored for Bangla text and fine-tuned for tasks like Named Entity Recognition (NER), Part-of-Speech (POS) tagging,Sentence Similarity,Paraphrase Identification,etc.The model was trained on two NVIDIA GeForce A40 GPUs.

Training Data

The banELECTRA-Base model was pre-trained on a 32 GB Bangla text dataset. Below are the dataset statistics:

  • Total Words: ~1.996 billion
  • Unique Words: ~21.24 million
  • Total Sentences: ~165.38 million
  • Total Documents: ~15.62 million

Model Architecture and Training

The benELECTRA model was trained using the official ELECTRA repository with carefully selected hyperparameters to optimize performance for Bangla text. The model uses a vocabulary size of 50,000 tokens and consists of 12 hidden layers with 768 hidden dimensions and 12 attention heads in the discriminator. The generator is scaled to one-third the size of the discriminator, and training is conducted with a maximum sequence length of 256. The training employed a batch size of 96, a learning rate of 0.0004 with 10,000 warm-up steps, and a total of 1,000,000 training steps. Regularization techniques, such as a dropout rate of 0.1 and a weight decay of 0.01, were applied to improve generalization.

How to Use

from transformers import ElectraTokenizer, ElectraForSequenceClassification

model_name = "banglagov/banELECTRA-Base"  
tokenizer = ElectraTokenizer.from_pretrained(model_name)
model = ElectraForSequenceClassification.from_pretrained(model_name)

text = "এর ফলে আগামী বছর বেকারত্বের হার বৃদ্ধি এবং অর্থনৈতিক মন্দার আশঙ্কায় ইউরোপীয় ইউনিয়ন ।"

inputs = tokenizer(text, return_tensors="pt")

print("Input Tokens ids:", inputs)

Experimental Results

The banELECTRA-Base model demonstrates strong performance on downstream tasks, as shown below:

Task Precision Recall F1
Named Entity Recognition (NER) 0.8842 0.7930 0.8249
Part-of-Speech (POS) Tagging 0.8757 0.8717 0.8706

Here we used banELECTRA-Base model with Noisy Label model architecture.

Downloads last month
43
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.