PangolinGuard-Large
LLM applications face critical security challenges in form of prompt injections and jailbreaks. This can result in models leaking sensitive data or deviating from their intended behavior. Existing safeguard models are not fully open and have limited context windows (e.g., only 512 tokens in LlamaGuard).
PangolinGuard is a ModernBERT (Large), lightweight model that discriminates malicious prompts.
๐ค Tech-Blog | GitHub Repo
Intended uses
- Adding custom, self-hosted safety checks to AI agents and conversational interfaces
- Topic and content moderation
- Mitigating risks when connecting AI pipelines to external services
Evaluation data
Evaluated on unseen data from a subset of specialized benchmarks targeting prompt safety and malicious input detection, while testing over-defense behavior:
- NotInject: Designed to measure over-defense in prompt guard models by including benign inputs enriched with trigger words common in prompt injection attacks.
- BIPIA: Evaluates privacy invasion attempts and boundary-pushing queries through indirect prompt injection attacks.
- Wildguard-Benign: Represents legitimate but potentially ambiguous prompts.
- PINT: Evaluates particularly nuanced prompt injection, jailbreaks, and benign prompts that could be misidentified as malicious.
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 64
- eval_batch_size: 32
- seed: 42
- optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- bf16: True
- num_epochs: 2
Training results
Training Loss | Epoch | Step | Validation Loss | F1 | Accuracy |
---|---|---|---|---|---|
0.1519 | 0.1042 | 100 | 0.1354 | 0.9229 | 0.9534 |
0.068 | 0.2083 | 200 | 0.0553 | 0.9689 | 0.9797 |
0.0458 | 0.3125 | 300 | 0.0555 | 0.9758 | 0.9844 |
0.0389 | 0.4167 | 400 | 0.0442 | 0.9804 | 0.9874 |
0.04 | 0.5208 | 500 | 0.0323 | 0.9842 | 0.9897 |
0.0308 | 0.625 | 600 | 0.0357 | 0.9836 | 0.9894 |
0.0357 | 0.7292 | 700 | 0.0336 | 0.9861 | 0.9909 |
0.0306 | 0.8333 | 800 | 0.0299 | 0.9880 | 0.9921 |
0.0246 | 0.9375 | 900 | 0.0338 | 0.9846 | 0.9900 |
0.0195 | 1.0417 | 1000 | 0.0260 | 0.9881 | 0.9922 |
0.0124 | 1.1458 | 1100 | 0.0225 | 0.9887 | 0.9926 |
0.005 | 1.25 | 1200 | 0.0286 | 0.9874 | 0.9917 |
0.0075 | 1.3542 | 1300 | 0.0313 | 0.9897 | 0.9933 |
0.0065 | 1.4583 | 1400 | 0.0318 | 0.9892 | 0.9930 |
0.0093 | 1.5625 | 1500 | 0.0257 | 0.9903 | 0.9937 |
0.0099 | 1.6667 | 1600 | 0.0233 | 0.9889 | 0.9927 |
0.0054 | 1.7708 | 1700 | 0.0221 | 0.9905 | 0.9938 |
0.0077 | 1.875 | 1800 | 0.0222 | 0.9907 | 0.9939 |
0.0052 | 1.9792 | 1900 | 0.0225 | 0.9904 | 0.9937 |
Framework versions
- Transformers 4.48.3
- Pytorch 2.5.1+cu124
- Datasets 3.3.2
- Tokenizers 0.21.0
- Downloads last month
- 18
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
Model tree for dcarpintero/pangolin-guard-large
Base model
answerdotai/ModernBERT-large