|
--- |
|
library_name: transformers |
|
tags: [Fill Mask, Persian , BERT] |
|
--- |
|
|
|
## Model Details |
|
|
|
### Model Description |
|
|
|
This model is fine-tuned for the task of masked language modeling in Persian. The model can predict missing words in Persian sentences when a word is replaced by the [MASK] token. It is useful for a range of NLP applications, including text completion, question answering, and contextual understanding of Persian texts. |
|
|
|
- **Developed by:** Behpouyan |
|
- **Model type:** Encoder |
|
- **Language(s) (NLP):** Persian |
|
|
|
|
|
## How to Get Started with the Model |
|
|
|
``` python |
|
from transformers import AutoTokenizer, AutoModelForMaskedLM |
|
import torch |
|
|
|
# Load the tokenizer and model |
|
tokenizer = AutoTokenizer.from_pretrained("Behpouyan/Behpouyan-Fill-Mask") |
|
model = AutoModelForMaskedLM.from_pretrained("Behpouyan/Behpouyan-Fill-Mask") |
|
|
|
# List of 5 Persian sentences with a masked word (replacing a word with [MASK]) |
|
sentences = [ |
|
"این کتاب بسیار <mask> است.", # The book is very <mask |
|
"مشتری همیشه از <mask> شما راضی است.", # The customer is always satisfied with your <mask |
|
"من به دنبال <mask> هستم.", # I am looking for <mask |
|
"این پروژه نیاز به <mask> دارد.", # This project needs <mask |
|
"تیم ما برای انجام کارها <mask> است." # Our team is <mask to do the tasks |
|
] |
|
|
|
# Function to predict masked words |
|
def predict_masked_word(sentence): |
|
# Tokenize the input sentence |
|
inputs = tokenizer(sentence, return_tensors="pt") |
|
|
|
# Forward pass to get logits |
|
with torch.no_grad(): |
|
outputs = model(**inputs) |
|
logits = outputs.logits |
|
|
|
# Get the position of the [MASK] token |
|
mask_token_index = torch.where(inputs.input_ids == tokenizer.mask_token_id)[1].item() |
|
|
|
# Get the predicted token |
|
predicted_token_id = torch.argmax(logits[0, mask_token_index]).item() |
|
predicted_word = tokenizer.decode([predicted_token_id]) |
|
|
|
return predicted_word |
|
|
|
# Test the model on the sentences |
|
for sentence in sentences: |
|
predicted_word = predict_masked_word(sentence) |
|
print(f"Sentence: {sentence}") |
|
print(f"Predicted word: {predicted_word}") |
|
print("-" * 50) |
|
``` |
|
|