AIJapanese
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -36,7 +36,7 @@ We used the [lm-evaluation-harness](https://github.com/Stability-AI/lm-evaluatio
|
|
36 |
| Moriyasu_Qwen2_JP_7B (OURS) | **0.9491** | **0.9111** | 0.9550 | 0.8748 | 0.8924 | 0.1966 | **0.8238** | 0.5560 | **0.7699** |
|
37 |
| Qwen2-7B-Instruct | 0.9080 | 0.7807 | 0.9329 | 0.9290 | 0.8334 | 0.1905 | 0.7216 | **0.6120** | 0.7385 |
|
38 |
| SakanaAI/EvoLLM-JP-v1-7B | 0.8919 | 0.6602 | 0.9555 | 0.9210 | 0.8641 | **0.2331** | 0.8165 | 0.4760 | 0.7273 |
|
39 |
-
| Llama-3-ELYZA-JP-8B |
|
40 |
| Llama-3-Swallow-8B-Instruct-v0.1 | 0.9249 | 0.6212 | 0.9427 | **0.9373** | **0.9083** | 0.1961 | 0.7404 | 0.5000 | 0.7214 |
|
41 |
| Tanuki-8B-dpo-v1.0| 0.7918 | 0.4305 | 0.9226 | 0.8229 | 0.7799 | 0.1168 | 0.7039 | 0.4360 | 0.6256 |
|
42 |
|
@@ -76,3 +76,55 @@ Due to limited computational resources, we conducted evaluations on only a selec
|
|
76 |
| Moriyasu_Qwen2_JP_7B (OURS) | **5.15** | 7.10 | **8.45** | **6.85** | **5.85** | **8.15** | **7.10** | **7.65** | **7.04** |
|
77 |
| Llama-3-ELYZA-JP-8B | 3.65 | **7.2** | 7.3 | 4.00 | 5.55 | 6.70 | 5.80 | 7.85 | 6.01 |
|
78 |
| Llama 3.1 Swallow 8B Instruct v0.1| 4.80 | 6.80 | 7.05 | 4.75 | 4.25 | 7.10 | 6.20 | 6.45 | 5.92 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
36 |
| Moriyasu_Qwen2_JP_7B (OURS) | **0.9491** | **0.9111** | 0.9550 | 0.8748 | 0.8924 | 0.1966 | **0.8238** | 0.5560 | **0.7699** |
|
37 |
| Qwen2-7B-Instruct | 0.9080 | 0.7807 | 0.9329 | 0.9290 | 0.8334 | 0.1905 | 0.7216 | **0.6120** | 0.7385 |
|
38 |
| SakanaAI/EvoLLM-JP-v1-7B | 0.8919 | 0.6602 | 0.9555 | 0.9210 | 0.8641 | **0.2331** | 0.8165 | 0.4760 | 0.7273 |
|
39 |
+
| Llama-3-ELYZA-JP-8B |0.9240 | 0.6485 | **0.9567** | 0.9204 | 0.8743 | 0.2135 | 0.7821 | 0.4920 | 0.7264 |
|
40 |
| Llama-3-Swallow-8B-Instruct-v0.1 | 0.9249 | 0.6212 | 0.9427 | **0.9373** | **0.9083** | 0.1961 | 0.7404 | 0.5000 | 0.7214 |
|
41 |
| Tanuki-8B-dpo-v1.0| 0.7918 | 0.4305 | 0.9226 | 0.8229 | 0.7799 | 0.1168 | 0.7039 | 0.4360 | 0.6256 |
|
42 |
|
|
|
76 |
| Moriyasu_Qwen2_JP_7B (OURS) | **5.15** | 7.10 | **8.45** | **6.85** | **5.85** | **8.15** | **7.10** | **7.65** | **7.04** |
|
77 |
| Llama-3-ELYZA-JP-8B | 3.65 | **7.2** | 7.3 | 4.00 | 5.55 | 6.70 | 5.80 | 7.85 | 6.01 |
|
78 |
| Llama 3.1 Swallow 8B Instruct v0.1| 4.80 | 6.80 | 7.05 | 4.75 | 4.25 | 7.10 | 6.20 | 6.45 | 5.92 |
|
79 |
+
|
80 |
+
### Elyza task 100:
|
81 |
+
|
82 |
+
For this benchmark, we use [Elyza task 100](https://huggingface.co/datasets/elyza/ELYZA-tasks-100) dataset and gpt4o scoring prompt of Elyza. Link prompt from [this blog](https://zenn.dev/elyza/articles/7ece3e73ff35f4)
|
83 |
+
|
84 |
+
|Model|Score|
|
85 |
+
|---|---|
|
86 |
+
| Moriyasu_Qwen2_JP_7B (OURS) | 3.37 |
|
87 |
+
| Llama-3-ELYZA-JP-8B | **3.66** |
|
88 |
+
| Llama 3.1 Swallow 8B Instruct v0.1| 3.32 |
|
89 |
+
|
90 |
+
### Nejumi leaderboard 3
|
91 |
+
We will contact Nejumi soon to evaluate on this benchmark
|
92 |
+
|
93 |
+
|
94 |
+
# Usage
|
95 |
+
|
96 |
+
```python
|
97 |
+
import torch
|
98 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
99 |
+
path = 'AIJapanese/Moriyasu_Qwen2_JP_7B'
|
100 |
+
model = AutoModelForCausalLM.from_pretrained(
|
101 |
+
path,
|
102 |
+
torch_dtype=torch.bfloat16,
|
103 |
+
device_map="auto",
|
104 |
+
use_cache=True
|
105 |
+
)
|
106 |
+
tokenizer = AutoTokenizer.from_pretrained(path)
|
107 |
+
|
108 |
+
system_prompt = "あなたは誠実で優秀な日本人アシスタントです。常に可能な限り最も役立つ回答を提供するように努めてください。"
|
109 |
+
prompt = "日本で一番高い山は何ですか "
|
110 |
+
conversation = [{"role": "system", "content": system_prompt }]
|
111 |
+
conversation.append({"role": "user", "content": prompt})
|
112 |
+
text = tokenizer.apply_chat_template(
|
113 |
+
conversation,
|
114 |
+
tokenize=False,
|
115 |
+
add_generation_prompt=True)
|
116 |
+
|
117 |
+
model_inputs = tokenizer(text,return_tensors="pt").to(model.device)
|
118 |
+
generated_ids = model.generate(
|
119 |
+
model_inputs.input_ids,
|
120 |
+
max_new_tokens=2048,
|
121 |
+
temperature = 0.1,
|
122 |
+
#top_p=0.95,
|
123 |
+
#top_k=40,
|
124 |
+
)
|
125 |
+
generated_ids = [
|
126 |
+
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
|
127 |
+
]
|
128 |
+
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
|
129 |
+
print(response)
|
130 |
+
```
|