Model Card: Redact-V1 PII Detection Model
This model is designed to automatically detect and redact personally identifiable information (PII) from text. It leverages a deep learning architecture implemented in TensorFlow and fine-tuned on a curated dataset.
Overview
The Redact-V1 model is engineered for robust PII detection, with applications in data redaction and privacy preservation. The model has been trained and evaluated using the Redact-V1 dataset, ensuring a high degree of accuracy in recognizing sensitive entities.
Model Details
- Model File: final_model.h5
- Labels: labels.json
The training performance indicators (loss, accuracy, precision, and recall) have been recorded and can be found in the training performance file. Visualizations of model performance, including confusion matrices and training history, are available in the images folder.
Supported Classes
The model supports the following PII classes:
- People Name:
- Card Number:
- Account Number:
- Social Security Number:
- Government ID Number:
- Date of Birth:
- Password:
- Tax ID Number:
- Phone Number:
- Residential Address:
- Email Address:
- IP Number:
- Passport:
- Driver License:
Usage
Below is sample code to load and use the model in a Python environment:
import os
import json
import tensorflow as tf
import tensorflow_hub as hub
# Paths to the model and labels.
MODEL_PATH = r"final_model.h5"
LABELS_PATH = r"labels.json"
def load_labels(labels_file):
with open(labels_file, 'r', encoding='utf-8') as f:
return json.load(f)
def main():
print("Loading model from:", MODEL_PATH)
model = tf.keras.models.load_model(MODEL_PATH, custom_objects={'KerasLayer': hub.KerasLayer})
print("Model loaded successfully.")
labels = load_labels(LABELS_PATH)
print("Loaded labels:", labels)
# Sample sentence for testing.
sample_sentence = "John Doe's account number 1234567890 was flagged for review due to unusual activity."
print("Sample sentence:", sample_sentence)
# Run prediction.
predictions = model.predict([sample_sentence])
print("Predictions:")
for label, prob in zip(labels, predictions[0]):
print(f"{label}: {prob:.2f}")
if __name__ == "__main__":
main()
Professional Model Card
Workspace
Collecting workspace information
Training Data & Source Code
- Training Data: The model was trained on the Redact-V1 dataset.
- Source Code: The training pipeline and preprocessing code can be reviewed in the NLU-Redact-PII repository.
License
This project is licensed under the Apache-2.0 license.