File size: 2,054 Bytes
d7041cd
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
Model	Size	Accuracy/std	Precision_Unsafe/std	Recall_Unsafe/std	Precision_Safe/std	Recall_Safe/std
DeepSeek-LLM-67B-Chat	>65B	76.76/0.35	73.40/0.37	84.26/0.40	81.34/0.35	69.19/0.64
Llama3-ChatQA-1.5-70B	>65B	65.29/0.29	66.24/0.50	62.92/0.12	64.43/0.19	67.69/0.63
Qwen1.5-72B-Chat	>65B	62.91/0.50	73.86/0.84	40.46/0.97	58.75/0.35	85.55/0.62
Opt-66B	>65B	54.46/0.17	53.22/0.06	76.94/0.24	57.73/0.49	31.77/0.28
Yi-1.5-34B-Chat	~30B	60.06/0.43	58.14/0.40	72.51/0.55	63.27/0.56	47.56/0.42
Opt-30B	~30B	50.88/0.11	50.76/0.12	72.95/0.16	51.18/0.26	28.62/0.28
InternLM2-Chat-20B	10B~20B	70.21/0.55	73.30/0.70	63.79/0.43	67.82/0.45	76.65/0.67
Qwen1.5-14B	10B~20B	68.25/0.44	65.87/0.37	76.02/0.72	71.51/0.59	60.44/0.20
Baichuan2-13B-Chat	10B~20B	62.86/0.31	64.17/0.33	58.61/0.80	61.75/0.30	67.13/0.56
Ziya2-13B-Chat	10B~20B	53.40/0.43	53.33/0.38	56.18/0.41	53.48/0.53	50.62/0.61
Opt-13B	10B~20B	50.18/0.26	50.29/0.20	69.97/0.37	49.94/0.47	30.22/0.31
Gemma-1.1-7B	5B~10B	71.70/0.26	68.66/0.37	80.11/0.05	76.00/0.09	63.26/0.47
DeepSeek-LLM-7B-Chat	5B~10B	71.63/0.17	69.50/0.15	77.33/0.67	74.33/0.41	65.90/0.38
GLM-4-9B-Chat	5B~10B	70.96/0.23	82.15/0.55	53.73/0.48	65.50/0.18	88.27/0.41
Mistral-7B	5B~10B	70.41/0.41	68.55/0.52	75.67/0.22	72.71/0.26	65.12/0.58
Qwen1.5-7B-Chat	5B~10B	70.36/0.39	64.66/0.27	90.09/0.57	83.55/0.82	50.53/0.18
Yi-1.5-9B-Chat	5B~10B	62.12/0.38	64.42/0.42	54.53/0.43	60.43/0.36	69.75/0.37
Llama3-ChatQA-1.5-8B	5B~10B	61.28/0.40	57.63/0.20	85.84/0.43	72.02/0.95	36.61/0.54
Baichuan2-7B	5B~10B	59.43/0.24	72.06/0.66	31.11/0.40	55.95/0.12	87.89/0.20
InternLM2-chat-7B	5B~10B	58.79/0.09	62.70/0.19	43.88/0.17	56.68/0.14	73.77/0.13
GPT-J-6B	5B~10B	52.65/0.32	52.42/0.32	62.00/0.42	52.99/0.37	43.21/0.92
Opt-6.7B	5B~10B	50.00/0.11	50.17/0.17	64.70/0.35	49.69/0.04	35.18/0.44
GPT-4o	API	73.78/0.30	97.75/0.13	48.66/0.04	65.84/0.55	98.88/0.04
GPT-4-Turbo	API	71.67/0.17	80.13/0.64	57.59/0.69	66.93/0.44	85.74/0.35
Pespective	API	69.28/0.32	69.96/0.79	67.49/0.32	68.64/0.32	71.06/0.43
GPT-3.5	API	64.70/0.44	76.12/0.55	42.79/0.64	60.24/0.76	86.59/0.32