sahajBERT-NER / README.md
Upload
Step 2489
bccc26c
|
raw
history blame
1.93 kB
metadata
language: bn
tags:
  - collaborative
  - bengali
  - NER
license: apache-2.0
datasets: xtreme
metrics:
  - Loss
  - Accuracy
  - Precision
  - Recall

sahajBERT Named Entity Recognition

Model description

sahajBERT fine-tuned for NER using the bengali split of WikiANN .

Named Entities predicted by the model:

Label id Label
0 O
1 B-PER
2 I-PER
3 B-ORG
4 I-ORG
5 B-LOC
6 I-LOC

Intended uses & limitations

How to use

You can use this model directly with a pipeline for token classification:

from transformers import AlbertForTokenClassification, TokenClassificationPipeline, PreTrainedTokenizerFast

# Initialize tokenizer
tokenizer = PreTrainedTokenizerFast.from_pretrained("neuropark/sahajBERT-NER")

# Initialize model
model = AlbertForTokenClassification.from_pretrained("neuropark/sahajBERT-NER")

# Initialize pipeline
pipeline = TokenClassificationPipeline(tokenizer=tokenizer, model=model)

raw_text = "এই ইউনিয়নে ৩ টি মৌজা ও ১০ টি গ্রাম আছে ।" # Change me
output = pipeline(raw_text)

Limitations and bias

WIP

Training data

The model was initialized with pre-trained weights of sahajBERT at step 2489 and trained on the bengali split of WikiANN

Training procedure

Coming soon!

Eval results

accuracy: 0.9291424418604651

f1: 0.8475143403441683

loss: 0.2975200116634369

precision: 0.8254189944134078

recall: 0.8708251473477406

BibTeX entry and citation info

Coming soon!