You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Model Card for Model ID

A small model to detect saturation jailbreak attacks. Not intended for standalone use against other kinds of jailbreaks.

Model Details

Model Description

  • Developed by: Guardrails AI, Joseph Catrambone
  • Funded by [optional]: Guardrails AI
  • Model type: Transformer, BERT
  • Language(s) (NLP): English
  • License: Restrictive
  • Finetuned from model [optional]: bert-tiny

Model Sources [optional]

Uses

Designed as a small prefilter for a subset of saturation attacks.

Out-of-Scope Use

Not designed to catch other types of jailbreaks. Saturation protection is one part of a more complite suite of defenses against improper use of ML systems.

Downloads last month
36
Safetensors
Model size
4.39M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for GuardrailsAI/prompt-saturation-attack-detector

Finetuned
(2309)
this model