AIJapanese commited on
Commit
ac81f2e
·
verified ·
1 Parent(s): 1f0fe1b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +53 -1
README.md CHANGED
@@ -36,7 +36,7 @@ We used the [lm-evaluation-harness](https://github.com/Stability-AI/lm-evaluatio
36
  | Moriyasu_Qwen2_JP_7B (OURS) | **0.9491** | **0.9111** | 0.9550 | 0.8748 | 0.8924 | 0.1966 | **0.8238** | 0.5560 | **0.7699** |
37
  | Qwen2-7B-Instruct | 0.9080 | 0.7807 | 0.9329 | 0.9290 | 0.8334 | 0.1905 | 0.7216 | **0.6120** | 0.7385 |
38
  | SakanaAI/EvoLLM-JP-v1-7B | 0.8919 | 0.6602 | 0.9555 | 0.9210 | 0.8641 | **0.2331** | 0.8165 | 0.4760 | 0.7273 |
39
- | Llama-3-ELYZA-JP-8B |92.40 | 0.6485 | **0.9567** | 0.9204 | 0.8743 | 0.2135 | 0.7821 | 0.4920 | 0.7264 |
40
  | Llama-3-Swallow-8B-Instruct-v0.1 | 0.9249 | 0.6212 | 0.9427 | **0.9373** | **0.9083** | 0.1961 | 0.7404 | 0.5000 | 0.7214 |
41
  | Tanuki-8B-dpo-v1.0| 0.7918 | 0.4305 | 0.9226 | 0.8229 | 0.7799 | 0.1168 | 0.7039 | 0.4360 | 0.6256 |
42
 
@@ -76,3 +76,55 @@ Due to limited computational resources, we conducted evaluations on only a selec
76
  | Moriyasu_Qwen2_JP_7B (OURS) | **5.15** | 7.10 | **8.45** | **6.85** | **5.85** | **8.15** | **7.10** | **7.65** | **7.04** |
77
  | Llama-3-ELYZA-JP-8B | 3.65 | **7.2** | 7.3 | 4.00 | 5.55 | 6.70 | 5.80 | 7.85 | 6.01 |
78
  | Llama 3.1 Swallow 8B Instruct v0.1| 4.80 | 6.80 | 7.05 | 4.75 | 4.25 | 7.10 | 6.20 | 6.45 | 5.92 |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
36
  | Moriyasu_Qwen2_JP_7B (OURS) | **0.9491** | **0.9111** | 0.9550 | 0.8748 | 0.8924 | 0.1966 | **0.8238** | 0.5560 | **0.7699** |
37
  | Qwen2-7B-Instruct | 0.9080 | 0.7807 | 0.9329 | 0.9290 | 0.8334 | 0.1905 | 0.7216 | **0.6120** | 0.7385 |
38
  | SakanaAI/EvoLLM-JP-v1-7B | 0.8919 | 0.6602 | 0.9555 | 0.9210 | 0.8641 | **0.2331** | 0.8165 | 0.4760 | 0.7273 |
39
+ | Llama-3-ELYZA-JP-8B |0.9240 | 0.6485 | **0.9567** | 0.9204 | 0.8743 | 0.2135 | 0.7821 | 0.4920 | 0.7264 |
40
  | Llama-3-Swallow-8B-Instruct-v0.1 | 0.9249 | 0.6212 | 0.9427 | **0.9373** | **0.9083** | 0.1961 | 0.7404 | 0.5000 | 0.7214 |
41
  | Tanuki-8B-dpo-v1.0| 0.7918 | 0.4305 | 0.9226 | 0.8229 | 0.7799 | 0.1168 | 0.7039 | 0.4360 | 0.6256 |
42
 
 
76
  | Moriyasu_Qwen2_JP_7B (OURS) | **5.15** | 7.10 | **8.45** | **6.85** | **5.85** | **8.15** | **7.10** | **7.65** | **7.04** |
77
  | Llama-3-ELYZA-JP-8B | 3.65 | **7.2** | 7.3 | 4.00 | 5.55 | 6.70 | 5.80 | 7.85 | 6.01 |
78
  | Llama 3.1 Swallow 8B Instruct v0.1| 4.80 | 6.80 | 7.05 | 4.75 | 4.25 | 7.10 | 6.20 | 6.45 | 5.92 |
79
+
80
+ ### Elyza task 100:
81
+
82
+ For this benchmark, we use [Elyza task 100](https://huggingface.co/datasets/elyza/ELYZA-tasks-100) dataset and gpt4o scoring prompt of Elyza. Link prompt from [this blog](https://zenn.dev/elyza/articles/7ece3e73ff35f4)
83
+
84
+ |Model|Score|
85
+ |---|---|
86
+ | Moriyasu_Qwen2_JP_7B (OURS) | 3.37 |
87
+ | Llama-3-ELYZA-JP-8B | **3.66** |
88
+ | Llama 3.1 Swallow 8B Instruct v0.1| 3.32 |
89
+
90
+ ### Nejumi leaderboard 3
91
+ We will contact Nejumi soon to evaluate on this benchmark
92
+
93
+
94
+ # Usage
95
+
96
+ ```python
97
+ import torch
98
+ from transformers import AutoModelForCausalLM, AutoTokenizer
99
+ path = 'AIJapanese/Moriyasu_Qwen2_JP_7B'
100
+ model = AutoModelForCausalLM.from_pretrained(
101
+ path,
102
+ torch_dtype=torch.bfloat16,
103
+ device_map="auto",
104
+ use_cache=True
105
+ )
106
+ tokenizer = AutoTokenizer.from_pretrained(path)
107
+
108
+ system_prompt = "あなたは誠実で優秀な日本人アシスタントです。常に可能な限り最も役立つ回答を提供するように努めてください。"
109
+ prompt = "日本で一番高い山は何ですか "
110
+ conversation = [{"role": "system", "content": system_prompt }]
111
+ conversation.append({"role": "user", "content": prompt})
112
+ text = tokenizer.apply_chat_template(
113
+ conversation,
114
+ tokenize=False,
115
+ add_generation_prompt=True)
116
+
117
+ model_inputs = tokenizer(text,return_tensors="pt").to(model.device)
118
+ generated_ids = model.generate(
119
+ model_inputs.input_ids,
120
+ max_new_tokens=2048,
121
+ temperature = 0.1,
122
+ #top_p=0.95,
123
+ #top_k=40,
124
+ )
125
+ generated_ids = [
126
+ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
127
+ ]
128
+ response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
129
+ print(response)
130
+ ```