convert this to raw readme.md file, it's a model card on huggingface # Pashto BERT (BERT-Base) ## Model Overview This is a monolingual **Pashto BERT (BERT-Base)** model trained on a large **Pashto corpus**. The model is designed to understand and generate text in **Pashto**, making it suitable for various downstream **Natural Language Processing (NLP) tasks**. ## Model Details - **Architecture:** BERT-Base (12 layers, 768 hidden size, 12 attention heads, 110M parameters) - **Language:** Pashto (ps) - **Training Corpus:** A diverse set of Pashto text data, including news articles, books, and web content. - **Special Tokens:** `[CLS]`, `[SEP]`, `[PAD]`, `[MASK]`, `[UNK]` ## Intended Use This model can be **fine-tuned** for various Pashto-specific NLP tasks, such as: - **Sequence Classification:** Sentiment analysis, topic classification, and document categorization. - **Sequence Tagging:** Named entity recognition (NER) and part-of-speech (POS) tagging. - **Text Generation & Understanding:** Question answering, text summarization, and machine translation. ## How to Use This model can be loaded using the `transformers` library from Hugging Face: ```python from transformers import AutoModel, AutoTokenizer model_name = "your-huggingface-username/pashto-bert-base" tokenizer = AutoTokenizer.from_pretrained("/kaggle/working/model/") model = AutoModel.from_pretrained(model_name) text = "ستاسو نننۍ ورځ څنګه وه؟" tokens = tokenizer(text, return_tensors="pt") out = model(**tokens) ``` ## Training Details - **Optimization:** AdamW - **Sequence Length:** 128 - **Warmup Steps:** 10,000 - **Warmup Ratio:** 0.06 - **Learning Rate:** 1e-4 - **Weight Decay:** 0.01 - **Adam Optimizer Parameters:** - **Epsilon:** 1e-8 - **Betas:** (0.9, 0.999) - **Gradient Accumulation Steps:** 1 - **Max Gradient Norm:** 1.0 - **Scheduler:** `linear_schedule_with_warmup` ## Limitations & Biases - The model may reflect biases present in the training data. - Performance on **low-resource or domain-specific tasks** may require additional fine-tuning. - It is not trained for **code-switching scenarios** (e.g., mixing Pashto with English or other languages).