File size: 1,453 Bytes
0ff7a7d
 
 
 
 
 
 
20fb59f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f230517
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
20fb59f
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
---
tags:
- flair
- token-classification
- sequence-tagger-model
language: en
widget:
- text: >-
    SELECT shipping FROM users WHERE shipping = '201 Thayer St Providence RI
    02912'
license: mit
datasets:
- beki/privy
---
| Feature | Description |
| --- | --- |
| **Name** | `en_spacy_pii_distilbert` |
| **Version** | `0.0.0` |
| **spaCy** | `>=3.4.1,<3.5.0` |
| **Default Pipeline** | `transformer`, `ner` |
| **Components** | `transformer`, `ner` |
| **Vectors** | 0 keys, 0 unique vectors (0 dimensions) |
| **Sources** | Trained on a new [dataset for structured PII](https://huggingface.co/datasets/beki/privy) generated by [Privy](https://github.com/pixie-io/pixie/tree/main/src/datagen/pii/privy). For more details, see this [blog post](https://blog.px.dev/detect-pii/) |
| **License** | MIT |
| **Author** | [Benjamin Kilimnik](https://www.linkedin.com/in/benkilimnik/) |
---

## English PII in Flair

This is the large 5-class NER model for English trained on protocol trace data generated by [Privy](https://github.com/pixie-io/pixie/tree/main/src/datagen/pii/privy/)

F1-Score: **0.9522**

Predicts 5 tags:

| **tag**                        | **meaning** |
|---------------------------------|-----------|
| PER         | person name | 
| LOC         | location name | 
| ORG         | organization name | 
| DATE_TIME   | dates and times | 
| NRP         | nationalities, religious and political groups | 

Uses distilbert embeddings.

---