hyxmmm commited on
Commit
f923f14
·
verified ·
1 Parent(s): 48b2226

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +0 -8
README.md CHANGED
@@ -70,14 +70,6 @@ Thanks to [FlagScale](https://github.com/FlagOpen/FlagScale), we could concatena
70
 
71
  We evaluate Infinity-Instruct-3M-0625-Mistral-7B on the two most popular instructions following benchmarks. Mt-Bench is a set of challenging multi-turn questions including code, math and routine dialogue. AlpacaEval2.0 is based on AlpacaFarm evaluation set. Both of these two benchmarks use GPT-4 to judge the model answer. AlpacaEval2.0 displays a high agreement rate with human-annotated benchmark, Chatbot Arena. The result shows that InfInstruct-3M-0625-Mistral-7B achieved 31.42 in AlpacaEval2.0, which is higher than the 22.5 of GPT3.5 Turbo although it does not yet use RLHF. InfInstruct-3M-0625-Mistral-7B also achieves 8.1 in MT-Bench, which is comparable to the state-of-the-art billion-parameter LLM such as Llama-3-8B-Instruct and Mistral-7B-Instruct-v0.2.
72
 
73
- ## Performance on **Downstream tasks**
74
-
75
- We also evaluate Infinity-Instruct-3M-0625-Mistral-7B on diverse objective downstream tasks with [Opencompass](https://opencompass.org.cn):
76
-
77
- <p align="center">
78
- <img src="fig/result.png">
79
- </p>
80
-
81
  ## **How to use**
82
 
83
  Infinity-Instruct-3M-0625-Mistral-7B adopt the same chat template of [OpenHermes-2.5-Mistral-7B](https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B):
 
70
 
71
  We evaluate Infinity-Instruct-3M-0625-Mistral-7B on the two most popular instructions following benchmarks. Mt-Bench is a set of challenging multi-turn questions including code, math and routine dialogue. AlpacaEval2.0 is based on AlpacaFarm evaluation set. Both of these two benchmarks use GPT-4 to judge the model answer. AlpacaEval2.0 displays a high agreement rate with human-annotated benchmark, Chatbot Arena. The result shows that InfInstruct-3M-0625-Mistral-7B achieved 31.42 in AlpacaEval2.0, which is higher than the 22.5 of GPT3.5 Turbo although it does not yet use RLHF. InfInstruct-3M-0625-Mistral-7B also achieves 8.1 in MT-Bench, which is comparable to the state-of-the-art billion-parameter LLM such as Llama-3-8B-Instruct and Mistral-7B-Instruct-v0.2.
72
 
 
 
 
 
 
 
 
 
73
  ## **How to use**
74
 
75
  Infinity-Instruct-3M-0625-Mistral-7B adopt the same chat template of [OpenHermes-2.5-Mistral-7B](https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B):