grounded-ai
/

phi3-toxicity-judge-merge

Text Generation

Generated from Trainer

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

phi3-toxicity-judge-merge / README.md

Jlonge4's picture

Upload tokenizer

2c1c92f verified 3 months ago

|

history blame contribute delete

3.92 kB

	---
	base_model: microsoft/Phi-3-mini-4k-instruct
	library_name: transformers
	license: mit
	tags:
	- trl
	- sft
	- generated_from_trainer
	model-index:
	- name: outputs
	results: []
	---

	## Toxicity Classification Performance

	Our merged model demonstrates exceptional performance on the toxicity classification task, outperforming several state-of-the-art language models.

	### Classification Metrics

	```
	precision recall f1-score support

	0 0.85 0.90 0.87 175
	1 0.89 0.85 0.87 175

	accuracy 0.87 350
	macro avg 0.87 0.87 0.87 350
	weighted avg 0.87 0.87 0.87 350
	```

	### Model Usage
	For best results, we recommend starting with the following prompting strategy (and encourage tweaks as you see fit):

	```python
	def format_input_classification(text):
	input = f"""
	You are examining written text content. Here is the text:
	************
	[Text]: {text}
	************

	Examine the text and determine whether the text is toxic or not. Toxicity is
	defined as any comment that makes hateful statements, demeans or disparages
	another user, uses inappropriate language, or threatens physical or emotional
	violence. Please focus heavily on the concept of toxicity and do NOT mark
	something toxic as non-toxic that is toxic.

	Your classification must be single word followed by a reasoning as to why you chose
	the classification, either "toxic" or "non-toxic".
	"toxic" means that the text meets the definition of toxic.
	"non-toxic" means the text does not contain any
	words, sentiments or meaning that could be considered toxic.

	After your classification, provide the reason for your classification.
	"""
	return input


	text = format_input_classification("I could strangle him")
	messages = [
	{"role": "user", "content": text}
	]

	pipe = pipeline(
	"text-generation",
	model=base_model,
	model_kwargs={"attn_implementation": attn_implementation, "torch_dtype": torch.float16},
	tokenizer=tokenizer,
	)
	```

	Our model achieves an impressive precision of 0.85 for the toxic class and 0.89 for the non-toxic class, with a high overall accuracy of 0.87. The balanced F1-scores of 0.87 for both classes demonstrate the model's ability to handle this binary classification task effectively.

	### Comparison with Other Models

	\| Model \| Precision \| Recall \| F1 \|
	\|-------------------\|----------:\|-------:\|-------:\|
	\| Our Merged Model \| 0.85 \| 0.90 \| 0.87 \|
	\| GPT-4 \| 0.91 \| 0.91 \| 0.91 \|
	\| GPT-4 Turbo \| 0.89 \| 0.77 \| 0.83 \|
	\| Gemini Pro \| 0.81 \| 0.84 \| 0.83 \|
	\| GPT-3.5 Turbo \| 0.93 \| 0.83 \| 0.87 \|
	\| Palm \| - \| - \| - \|
	\| Claude V2 \| - \| - \| - \|
	[1] Scores from arize/phoenix

	Compared to other language models, our merged model demonstrates competitive performance at a much smaller size, with a precision score of 0.85 and an F1 score of 0.87.

	We will continue to refine and improve our merged model to achieve even better performance on model based toxicity evaluation tasks.

	Citations: [1] https://docs.arize.com/phoenix/evaluation/how-to-evals/running-pre-tested-evals/retrieval-rag-relevance

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 0.0009
	- train_batch_size: 1
	- eval_batch_size: 8
	- seed: 42
	- gradient_accumulation_steps: 4
	- total_train_batch_size: 4
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- lr_scheduler_warmup_steps: 10
	- training_steps: 110
	- mixed_precision_training: Native AMP

	### Framework versions

	- PEFT 0.11.1
	- Transformers 4.41.1
	- Pytorch 2.3.0+cu121
	- Datasets 2.19.1
	- Tokenizers 0.19.1