Fusion NER Models
AI & ML interests
NLP, NER
Recent Activity
Fusion NER Models
Here you can find NER models for Fusion project!
Table of content:
NER Models:
Here you can find a description on each of our models. Each row contains the model nickname, training description, model path (LINK), source dataset (with LINK), base model and entity types.
Results
We test our models on the IAHALT test set. We also check another models, such as DictaBert and HeBert. This is the performence results:
Model name | Precision | Recall | F1 - Score | Time (in seconds) |
---|---|---|---|---|
IAHALT_and_NEMO_PP | 0.714 | 0.353 | 0.461 | 83.128 |
HeBert | 0.574 | 0.474 | 0.494 | 86.483 |
NEMO | 0.553 | 0.51 | 0.525 | 81.422 |
IAHALT_and_NEMO | 0.692 | 0.678 | 0.684 | 83.702 |
Vitaly | 0.883 | 0.794 | 0.836 | 83.773 |
DictaBert | 0.916 | 0.834 | 0.872 | 70.465 |
DICTA_large | 0.917 | 0.845 | 0.879 | 206.251 |
Name-Sentences | 0.895 | 0.865 | 0.879 | 82.674 |
Basic | 0.897 | 0.866 | 0.881 | 84.479 |
Smart_Injection | 0.898 | 0.867 | 0.881 | 82.253 |
DICTA_Basic | 0.903 | 0.875 | 0.888 | 69.419 |
DICTA_Large_Smart | 0.904 | 0.875 | 0.889 | 204.324 |
DICTA_Small_Smart | 0.904 | 0.875 | 0.889 | 70.29 |
According to the results, we recommend to use DICTA_Small_Smart model.
Hebrew NLP models
You can find in the table Hebrew NLP models:
Model name | Link | Creator |
---|---|---|
HeNLP/HeRo | https://huggingface.co/HeNLP/HeRo | Vitaly Shalumov and Harel Haskey |
dicta-il/dictabert | https://huggingface.co/dicta-il/dictabert | Shaltiel Shmidman and Avi Shmidman and Moshe Koppel |
dicta-il/dictabert-large | https://huggingface.co/dicta-il/dictabert-large | Shaltiel Shmidman and Avi Shmidman and Moshe Koppel |
avichr/heBERT | https://huggingface.co/avichr/heBERT | Avihay Chriqui and Inbal Yahav |
Footnotes
[1] Name-Sentences:
Adding to the corpus sentences that contain only the entity we want the network to learn.
[2] Entity-Injection:
Replace a tagged entity in the original corpus with a new entity. By using, this method, the model can learn new entities (not labels!) which the model not extracted before.
[3] BI-BI Problem:
Building training corpus when entities from the same type appear in sequence, labeled as continuations of one another. For example, the text "הארי פוטר ורון וויזלי" would tagged as SINGLE entity. That problem prevent the model to extract entities correctly.
[4] Classic:
The classic NER types:
entity type | full name | examples |
---|---|---|
PER | Person | אדולף היטלר, רודולף הס, מרדכי אנילביץ |
GPE | Geopolitical Entity | גרמניה, פולין, ברלין, וורשה |
LOC | Location | מזרח אירופה, אגן הים התיכון, הגליל |
FAC | Facility | אוושוויץ, מגדלי התאומים, נתב"ג 2000, רחוב קפלן |
ORG | Organization | המפלגה הנאצית, חברת גוגל, ממשלת חוף השנהב |
TIMEX | Time Expression | 1945, שנת 1993, יום השואה, שנות ה-90 |
EVE | Event | השואה, מלחמת העולם השנייה, שלטון האפרטהייד |
TTL | Title | פיהרר, קיסר, מנכ"ל |
ANG | Language | עברית, ערבית, גרמנית |
DUC | Product | פייסבוק, F-16, תנובה |
WOA | Work of Art | דו"ח מבקר המדינה, עיתון הארץ, הארי פוטר, תיק 2000, |
MISC | Miscellaneous | קורונה, התו הירוק, מדלית זהב, ביטקוין |
Datasets for English NER (for cleaning wrong entities for english texts):
MIT License