clarin-pl
/

FastPDN-distiluse

+---
+language: pl
+license: mit
+tags:
+  - ner
+datasets:
+  - clarin-pl/kpwr-ner
+metrics:
+  - f1
+  - accuracy
+  - precision
+  - recall
+widget:
+  - text: "Nazywam się Jan Kowalski i mieszkam we Wrocławiu."
+    example_title: "Example"
+---
+# FastPDN
+FastPolDeepNer is model for Named Entity Recognition, designed for easy use, training and configuration. The forerunner of this project is [PolDeepNer2](https://gitlab.clarin-pl.eu/information-extraction/poldeepner2). The model implements a pipeline consisting of data processing and training using: hydra, pytorch, pytorch-lightning, transformers.
+Source code: https://gitlab.clarin-pl.eu/grupa-wieszcz/ner/fast-pdn
+## How to use
+Here is how to use this model to get Named Entities in text:
+```python
+from transformers import pipeline
+ner = pipeline('ner', model='clarin-pl/FastPDN', aggregation_strategy='simple')
+text = "Nazywam się Jan Kowalski i mieszkam we Wrocławiu."
+ner_results = ner(text)
+for output in ner_results:
+    print(output)
+{'entity_group': 'nam_liv_person', 'score': 0.9996054, 'word': 'Jan Kowalski', 'start': 12, 'end': 24}
+{'entity_group': 'nam_loc_gpe_city', 'score': 0.998931, 'word': 'Wrocławiu', 'start': 39, 'end': 48}
+```
+Here is how to use this model to get the logits for every token in text:
+```python
+from transformers import AutoTokenizer, AutoModelForTokenClassification
+tokenizer = AutoTokenizer.from_pretrained("clarin-pl/FastPDN")
+model = AutoModelForTokenClassification.from_pretrained("clarin-pl/FastPDN")
+text = "Nazywam się Jan Kowalski i mieszkam we Wrocławiu."
+encoded_input = tokenizer(text, return_tensors='pt')
+output = model(**encoded_input)
+```
+## Training data
+The FastPDN model was trained on datasets (with 82 class versions) of kpwr and cen. Annotation guidelines are specified [here](https://clarin-pl.eu/dspace/bitstream/handle/11321/294/WytyczneKPWr-jednostkiidentyfikacyjne.pdf).
+## Pretraining
+FastPDN models have been fine-tuned, thanks to pretrained models:
+- [herbert-base-case](https://huggingface.co/allegro/herbert-base-cased)
+- [distiluse-base-multilingual-cased-v1](sentence-transformers/distiluse-base-multilingual-cased-v1)
+## Evaluation
+Runs trained on `cen_n82` and `kpwr_n82`:
+| name |test/f1|test/pdn2_f1|test/acc|test/precision|test/recall|
+|---------|-------|------------|--------|--------------|-----------|
+|distiluse| 0.53 | 0.61 | 0.95 | 0.55 | 0.54 |
+| herbert | 0.68 | 0.78 | 0.97 | 0.7 | 0.69 |
+## Authors
+- Grupa Wieszcze CLARIN-PL
+- Wiktor Walentynowicz
+## Contact
+- Norbert Ropiak ([email protected])