|
--- |
|
language: |
|
- en |
|
tags: |
|
- token-classification |
|
- address-NER |
|
- NER |
|
- bert-base-uncased |
|
|
|
datasets: |
|
- Ultra Fine Entity Typing |
|
metrics: |
|
- Precision |
|
- Recall |
|
- F1 Score |
|
|
|
widget: |
|
- text: "Hi, I am Kermit and I live in Berlin" |
|
- text: "I come from India" |
|
- text: "ML6 is a very cool company from Belgium" |
|
|
|
|
|
--- |
|
|
|
|
|
|
|
## City-Country-NER |
|
|
|
A `bert-base-uncased` model finetuned on a custom dataset to detect `Country` and `City` names from a given sentence. |
|
|
|
### Custom Dataset |
|
We weakly supervised the [Ultra-Fine Entity Typing](https://www.cs.utexas.edu/~eunsol/html_pages/open_entity.html) dataset to include the `City` and `Country` information. We also did some extra preprocessing to remove false labels. |
|
|
|
The model predicts 3 different tags: `OTHER`, `CITY` and `COUNTRY` |
|
|
|
|
|
|
|
### How to use the finetuned model? |
|
|
|
``` |
|
from transformers import AutoTokenizer, AutoModelForTokenClassification |
|
|
|
tokenizer = AutoTokenizer.from_pretrained("ml6team/bert-base-uncased-city-country-ner", use_auth_token=True) |
|
|
|
model = AutoModelForTokenClassification.from_pretrained("ml6team/bert-base-uncased-city-country-ner", use_auth_token=True) |
|
|
|
from transformers import pipeline |
|
|
|
nlp = pipeline('ner', model=model, tokenizer=tokenizer, aggregation_strategy="simple") |
|
nlp("My name is Kermit and I live in London.") |
|
``` |