pii_model / README.md
ab-ai's picture
Update README.md
76fcc1a verified
metadata
license: apache-2.0
base_model: bert-base-cased
tags:
  - PII
  - NER
  - Bert
  - Token Classification
datasets:
  - generator
metrics:
  - precision
  - recall
  - f1
  - accuracy
model-index:
  - name: pii_model
    results:
      - task:
          name: Token Classification
          type: token-classification
        dataset:
          name: generator
          type: generator
          config: default
          split: train
          args: default
        metrics:
          - name: Precision
            type: precision
            value: 0.954751
          - name: Recall
            type: recall
            value: 0.965233
          - name: F1
            type: f1
            value: 0.959964
          - name: Accuracy
            type: accuracy
            value: 0.991199
pipeline_tag: token-classification
language:
  - en

Personal Identifiable Information (PII Model)

This model is a fine-tuned version of bert-base-cased on the generator dataset. It achieves the following results:

  • Training Loss: 0.003900
  • Validation Loss: 0.051071
  • Precision: 95.53%
  • Recall: 96.60%
  • F1: 96%
  • Accuracy:99.11%

Model description

Meet our digital safeguard, a savvy token classification model with a knack for spotting personally identifiable information (PII) entities. Trained on the illustrious Bert architecture and fine-tuned on a custom dataset, this model is like a superhero for privacy, swiftly detecting names, addresses, dates of birth, and more. With each token it encounters, it acts as a vigilant guardian, ensuring that sensitive information remains shielded from prying eyes, making the digital realm a safer and more secure place to explore.

Model can Detect Following Entity Group

  • ACCOUNTNUMBER
  • FIRSTNAME
  • ACCOUNTNAME
  • PHONENUMBER
  • CREDITCARDCVV
  • CREDITCARDISSUER
  • PREFIX
  • LASTNAME
  • AMOUNT
  • DATE
  • DOB
  • COMPANYNAME
  • BUILDINGNUMBER
  • STREET
  • SECONDARYADDRESS
  • STATE
  • EMAIL
  • CITY
  • CREDITCARDNUMBER
  • SSN
  • URL
  • USERNAME
  • PASSWORD
  • COUNTY
  • PIN
  • MIDDLENAME
  • IBAN
  • GENDER
  • AGE
  • ZIPCODE
  • SEX

Training hyperparameters

The following hyperparameters were used during training:

Hyperparameter Value
Learning Rate 5e-5
Train Batch Size 16
Eval Batch Size 16
Number of Training Epochs 7
Weight Decay 0.01
Save Strategy Epoch
Load Best Model at End True
Metric for Best Model F1
Push to Hub True
Evaluation Strategy Epoch
Early Stopping Patience 3

Training results

Epoch Training Loss Validation Loss Precision (%) Recall (%) F1 Score (%) Accuracy (%)
1 0.0443 0.038108 91.88 95.17 93.50 98.80
2 0.0318 0.035728 94.13 96.15 95.13 98.90
3 0.0209 0.032016 94.81 96.42 95.61 99.01
4 0.0154 0.040221 93.87 95.80 94.82 98.88
5 0.0084 0.048183 94.21 96.06 95.13 98.93
6 0.0037 0.052281 94.49 96.60 95.53 99.07

Author

[email protected]

Framework versions

  • Transformers 4.38.2
  • Pytorch 2.1.0+cu121
  • Datasets 2.18.0
  • Tokenizers 0.15.2