Evheniia
/

bert_ner

+---
+language:
+- en
+base_model:
+- google-bert/bert-large-uncased
+pipeline_tag: token-classification
+---
+# Model Card for Mountain NER Model
+ **Model Summary**
+This model is a fine-tuned Named Entity Recognition (NER) model specifically designed to identify mountain names in text. It is trained to detect and classify mountain entities using labeled data and state-of-the-art NER architectures. The model can handle both single-word and multi-word mountain names (e.g., "Kilimanjaro" or "Rocky Mountains").
+## Intended Use
+ - **Task**: Named Entity Recognition (NER) for mountain name identification.
+ - Input: A text string containing sentences or paragraphs.
+ - Output: A list of tokens annotated with labels:
+ - B-MOUNTAIN: Beginning of a mountain name.
+ - I-MOUNTAIN: Inside a mountain name.
+ - O: Outside of any mountain entity.
+## How to Use
+You can load this model using the Hugging Face `transformers` library:
+```python
+from transformers import BertTokenizer, BertForTokenClassification
+import torch
+tokenizer = BertTokenizer.from_pretrained("your_username/your_model")
+model = BertForTokenClassification.from_pretrained("your_username/your_model")
+text = "The Kilimanjaro is one of the most famous mountains."
+inputs = tokenizer(text, return_tensors="pt")
+with torch.no_grad():
+    outputs = model(**inputs)
+predictions = torch.argmax(outputs.logits, dim=-1)
+tokens = tokenizer.convert_ids_to_tokens(inputs["input_ids"].squeeze())
+labels = [model.config.id2label[label] for label in predictions.squeeze().tolist()]
+print(list(zip(tokens, labels)))
+```
+## Dataset
+The dataset includes annotated examples of text with mountain names in BIO format:
+- **Training Set**: 350 examples
+- **Validation Set**: 75 examples
+- **Test Set**: 75 examples
+The dataset was created by combining known mountain names with sentences containing them.
+## Limitations
+- The model is specifically designed for mountain names and may not generalize to other named entities.
+- Performance may degrade on noisy or informal text.
+- Multi-word mountain names must be tokenized correctly for proper recognition.
+- **Repository:** [https://github.com/Yevheniia-Ilchenko/Bert_NER]
+## Training Details
+The model was fine-tuned using the **BERT Base Uncased** architecture for token classification. Below are the training details:
+- **Model Architecture**: BERT for Token Classification (`bert-base-uncased`).
+- **Dataset**: Custom-labeled dataset in BIO format for mountain name recognition.
+- **Hyperparameters**:
+  - **Learning Rate**: `2e-4`
+  - **Batch Size**: `16`
+  - **Maximum Sequence Length**: `128`
+  - **Number of Epochs**: `3`
+- **Optimizer**: AdamW
+- **Warmup Steps**: `500`
+- **Weight Decay**: `0.01`
+- **Evaluation Strategy**: Steps-based evaluation with automatic saving of the best model.
+- **Training Arguments**:
+  - `save_total_limit=3`: Limits the number of saved checkpoints.
+  - `load_best_model_at_end=True`: Ensures the best model is used after training.
+- **Training Performance**:
+  - **Training Runtime**: `570.44 seconds`
+  - **Training Samples per Second**: `1.841`
+  - **Training Steps per Second**: `0.116`
+  - **Final Training Loss**: `0.4017`
+- **Evaluation Metrics**:
+  - **Evaluation Loss**: `0.0839`
+  - **Precision**: `97.11%`
+  - **Recall**: `96.89%`
+  - **F1 Score**: `96.91%`
+  - **Evaluation Runtime**: `13.76 seconds`
+  - **Samples per Second**: `5.449`
+  - **Steps per Second**: `0.726`