--- datasets: - eriktks/conll2003 language: - en metrics: - accuracy - precision - recall - f1 base_model: - google-bert/bert-base-cased pipeline_tag: token-classification library_name: transformers --- ## Dataset Used This model was trained on the [CoNLL 2003 dataset](https://huggingface.co/datasets/eriktks/conll2003) for Named Entity Recognition (NER) tasks. The dataset includes the following labels: - `O`, `B-PER`, `I-PER`, `B-ORG`, `I-ORG`, `B-LOC`, `I-LOC`, `B-MISC`, `I-MISC` For detailed descriptions of these labels, please refer to the [dataset card](https://huggingface.co/datasets/eriktks/conll2003). ## Model Training Details ### Training Arguments - **Model Architecture**: `bert-base-cased` for token classification - **Learning Rate**: `2e-5` - **Number of Epochs**: `20` - **Weight Decay**: `0.01` - **Evaluation Strategy**: `epoch` - **Save Strategy**: `epoch` *Additional default parameters from the Hugging Face Transformers library were used.* ## Evaluation Results ### Validation Set Performance - **Overall Metrics**: - Precision: 94.44% - Recall: 95.74% - F1 Score: 95.09% - Accuracy: 98.73% #### Per-Label Performance | Entity Type | Precision | Recall | F1 Score | |------------|-----------|--------|----------| | LOC | 97.27% | 97.11% | 97.19% | | MISC | 87.46% | 91.54% | 89.45% | | ORG | 93.37% | 93.44% | 93.40% | | PER | 96.02% | 98.15% | 97.07% | ### Test Set Performance - **Overall Metrics**: - Precision: 89.90% - Recall: 91.91% - F1 Score: 90.89% - Accuracy: 97.27% #### Per-Label Performance | Entity Type | Precision | Recall | F1 Score | |------------|-----------|--------|----------| | LOC | 92.87% | 92.87% | 92.87% | | MISC | 75.55% | 82.76% | 78.99% | | ORG | 88.32% | 90.61% | 89.45% | | PER | 95.28% | 96.23% | 95.75% | ## How to Use the Model You can load the model directly from the Hugging Face Model Hub: ```python from transformers import pipeline # Replace with your specific model checkpoint model_checkpoint = "Prikshit7766/bert-finetuned-ner" token_classifier = pipeline( "token-classification", model=model_checkpoint, aggregation_strategy="simple" ) # Example usage result = token_classifier("My name is Sylvain and I work at Hugging Face in Brooklyn.") print(result) ``` ### Example Output ```python [ { "entity_group":"PER", "score":0.9999881, "word":"Sylvain", "start":11, "end":18 }, { "entity_group":"ORG", "score":0.99961376, "word":"Hugging Face", "start":33, "end":45 }, { "entity_group":"LOC", "score":0.99989843, "word":"Brooklyn", "start":49, "end":57 } ] ```