|
--- |
|
tags: |
|
- flair |
|
- token-classification |
|
- sequence-tagger-model |
|
language: en |
|
widget: |
|
- text: >- |
|
SELECT shipping FROM users WHERE shipping = '201 Thayer St Providence RI |
|
02912' |
|
license: mit |
|
datasets: |
|
- beki/privy |
|
--- |
|
| Feature | Description | |
|
| --- | --- | |
|
| **Name** | `en_spacy_pii_distilbert` | |
|
| **Version** | `0.0.0` | |
|
| **spaCy** | `>=3.4.1,<3.5.0` | |
|
| **Default Pipeline** | `transformer`, `ner` | |
|
| **Components** | `transformer`, `ner` | |
|
| **Vectors** | 0 keys, 0 unique vectors (0 dimensions) | |
|
| **Sources** | Trained on a new [dataset for structured PII](https://huggingface.co/datasets/beki/privy) generated by [Privy](https://github.com/pixie-io/pixie/tree/main/src/datagen/pii/privy). For more details, see this [blog post](https://blog.px.dev/detect-pii/) | |
|
| **License** | MIT | |
|
| **Author** | [Benjamin Kilimnik](https://www.linkedin.com/in/benkilimnik/) | |
|
--- |
|
|
|
## English PII in Flair |
|
|
|
This is the large 5-class NER model for English trained on protocol trace data generated by [Privy](https://github.com/pixie-io/pixie/tree/main/src/datagen/pii/privy/) |
|
|
|
F1-Score: **0.9522** |
|
|
|
Predicts 5 tags: |
|
|
|
| **tag** | **meaning** | |
|
|---------------------------------|-----------| |
|
| PER | person name | |
|
| LOC | location name | |
|
| ORG | organization name | |
|
| DATE_TIME | dates and times | |
|
| NRP | nationalities, religious and political groups | |
|
|
|
Uses distilbert embeddings. |
|
|
|
--- |