katanemo
/

Arch-Guard-cpu

Text Classification

Model card Files Files and versions Community

Arch-Guard-cpu / README.md

cotran2's picture

Update README.md

5d4d29e verified about 1 month ago

|

3.78 kB

	---
	license: mit
	language:
	- en
	base_model:
	- meta-llama/Prompt-Guard-86M
	pipeline_tag: text-classification
	---
	# katanemolabs/Arch-Guard

	## Overview
	The Katanemo Arch-Guard collection is a collection state-of-the-art (SOTA) LLMs specifically designed for jailbreaking detection tasks.
	Definition: jailbreaking attempts are malicious prompts designed to alternate the intended behavior of the foundation LLM model of the application. They often violate the safety and security policies of the model.

	Arch Guard is a classifier model fine-tuned based on the open source model [Prompt-Guard-86M](https://huggingface.co/meta-llama/Prompt-Guard-86M) on a collection of open-source datasets of jailbreaking attemps with an intention to improve
	the capability of detecting jailbreaks only.

	In summary, the Katanemo Arch-Function collection demonstrates:
	- State-of-the-art performance in jailbreaking attempts detection
	- Optimized low-latency, low False Positive Rate, making it suitable for real-time, production environments, and best user experience.

	\| Dominant class = jailbreak \| \| \| \| \| \| \| \|
	\| -------------------------- \| ------ \| ------ \| ------ \| ------ \| ----- \| --------- \| ------ \|
	\| Model \| TPR \| TNR \| FPR \| FNR \| AUC \| Precision \| Recall \|
	\| Prompt-guard \| 0.8468 \| 0.9972 \| 0.0028 \| 0.1532 \| 0.857 \| 0.715 \| 0.999 \|
	\| Arch-guard \| 0.8887 \| 0.9970 \| 0.0030 \| 0.1113 \| 0.880 \| 0.761 \| 0.999 \|

	## Requirements
	The cpu model is quantized with OVM, please follow the instruction at https://github.com/huggingface/optimum-intel to install the package.

	## Datasets
	Evaluation dataset is from casual_conversation
	[casual_conversation](https://huggingface.co/datasets/SohamGhadge/casual-conversation)
	[commonqa](https://huggingface.co/datasets/tau/commonsense_qa)
	[financeqa](https://huggingface.co/datasets/AIR-Bench/qa_finance_en)
	[instruction](http://mbzuai/LaMini-instruction)
	[jailbreak_behavior_benign](https://huggingface.co/datasets/JailbreakBench/JBB-Behaviors)
	[jailbreak_behavior_harmful](https://huggingface.co/datasets/JailbreakBench/JBB-Behaviors)
	[jailbreak_judge](https://huggingface.co/datasets/JailbreakBench/JBB-Behaviors)
	[jailbreak_prompts](https://huggingface.co/datasets/rubend18/ChatGPT-Jailbreak-Prompts)
	[jailbreak_tweet](https://huggingface.co/datasets/cstnz/Disaster-tweet-jailbreaking)
	[jailbreak_v](https://huggingface.co/datasets/JailbreakV-28K/JailBreakV-28k)
	[jailbreak_vigil](https://huggingface.co/datasets/deadbits/vigil-jailbreak-all-MiniLM-L6-v2)
	[mental_health](https://huggingface.co/datasets/Amod/mental_health_counseling_conversations)
	[telecom](https://huggingface.co/datasets/talkmap/telecom-conversation-corpus)
	[truthqa](https://huggingface.co/datasets/truthfulqa/truthful_qa)
	[weather](https://huggingface.co/datasets/GEM/conversational_weather)

	## How to use

	````python
	from optimum.intel import OVModelForSequenceClassification

	device = "cpu"
	model_name = "katanemolabs/Arch-Guard-cpu"
	guard_mode = OVModelForSequenceClassification.from_pretrained(
	model_name, device_map=device, low_cpu_mem_usage=True
	)
	tokenizer = AutoTokenizer.from_pretrained(
	model_name, trust_remote_code=True
	)


	````

	# License
	Katanemo Arch-Guard is distributed under the [Katanemo license](https://huggingface.co/katanemolabs/Arch-Guard/blob/main/LICENSE).