--- language: bn tags: - collaborative - bengali - NER license: apache-2.0 datasets: xtreme metrics: - Loss - Accuracy - Precision - Recall --- # sahajBERT Named Entity Recognition ## Model description [sahajBERT](https://huggingface.co/neuropark/sahajBERT-NER) fine-tuned for NER using the bengali split of [WikiANN ](https://huggingface.co/datasets/wikiann). Named Entities predicted by the model: | Label id | Label | |:--------:|:----:| |0 |O| |1 |B-PER| |2 |I-PER| |3 |B-ORG| |4 |I-ORG| |5 |B-LOC| |6 |I-LOC| ## Intended uses & limitations #### How to use You can use this model directly with a pipeline for token classification: ```python from transformers import AlbertForTokenClassification, TokenClassificationPipeline, PreTrainedTokenizerFast # Initialize tokenizer tokenizer = PreTrainedTokenizerFast.from_pretrained("neuropark/sahajBERT-NER") # Initialize model model = AlbertForTokenClassification.from_pretrained("neuropark/sahajBERT-NER") # Initialize pipeline pipeline = TokenClassificationPipeline(tokenizer=tokenizer, model=model) raw_text = "এই ইউনিয়নে ৩ টি মৌজা ও ১০ টি গ্রাম আছে ।" # Change me output = pipeline(raw_text) ``` #### Limitations and bias WIP ## Training data The model was initialized with pre-trained weights of [sahajBERT](https://huggingface.co/neuropark/sahajBERT-NER) at step 2489 and trained on the bengali split of [WikiANN ](https://huggingface.co/datasets/wikiann) ## Training procedure Coming soon! ## Eval results accuracy: 0.9291424418604651 f1: 0.8475143403441683 loss: 0.2975200116634369 precision: 0.8254189944134078 recall: 0.8708251473477406 ### BibTeX entry and citation info Coming soon!