--- library_name: transformers tags: [Fill Mask, Persian , BERT] --- ## Model Details ### Model Description This model is fine-tuned for the task of masked language modeling in Persian. The model can predict missing words in Persian sentences when a word is replaced by the [MASK] token. It is useful for a range of NLP applications, including text completion, question answering, and contextual understanding of Persian texts. - **Developed by:** Behpouyan - **Model type:** Encoder - **Language(s) (NLP):** Persian ## How to Get Started with the Model ``` python from transformers import AutoTokenizer, AutoModelForMaskedLM import torch # Load the tokenizer and model tokenizer = AutoTokenizer.from_pretrained("Behpouyan/Behpouyan-Fill-Mask") model = AutoModelForMaskedLM.from_pretrained("Behpouyan/Behpouyan-Fill-Mask") # List of 5 Persian sentences with a masked word (replacing a word with [MASK]) sentences = [ "این کتاب بسیار <mask> است.", # The book is very <mask "مشتری همیشه از <mask> شما راضی است.", # The customer is always satisfied with your <mask "من به دنبال <mask> هستم.", # I am looking for <mask "این پروژه نیاز به <mask> دارد.", # This project needs <mask "تیم ما برای انجام کارها <mask> است." # Our team is <mask to do the tasks ] # Function to predict masked words def predict_masked_word(sentence): # Tokenize the input sentence inputs = tokenizer(sentence, return_tensors="pt") # Forward pass to get logits with torch.no_grad(): outputs = model(**inputs) logits = outputs.logits # Get the position of the [MASK] token mask_token_index = torch.where(inputs.input_ids == tokenizer.mask_token_id)[1].item() # Get the predicted token predicted_token_id = torch.argmax(logits[0, mask_token_index]).item() predicted_word = tokenizer.decode([predicted_token_id]) return predicted_word # Test the model on the sentences for sentence in sentences: predicted_word = predict_masked_word(sentence) print(f"Sentence: {sentence}") print(f"Predicted word: {predicted_word}") print("-" * 50) ```