Update README.md
Browse files
README.md
CHANGED
@@ -6,32 +6,7 @@ tags: []
|
|
6 |
<!-- Provide a quick summary of what the model is/does. -->
|
7 |
|
8 |
|
9 |
-
|
10 |
|
11 |
-
|
12 |
-
|
13 |
-
<!-- Provide a longer summary of what this model is. -->
|
14 |
-
|
15 |
-
Qwen2.5-7B-Instruct-preference is a fine-tuned model based on [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct). This model is fine-tuned on [original dataset](lightblue/response-dataset-plus-qwen-judged). The fine-tuned were carried out at a 1024 context length.
|
16 |
-
|
17 |
-
### Benchmarking
|
18 |
-
The benchmark score is obtained using [arena-hard-auto-multilingual](https://github.com/lightblue-tech/arena-hard-auto-multilingual).
|
19 |
-
|Qwen2.5-7B-Instruct|Ours|
|
20 |
-
|----|----|
|
21 |
-
|50.0|66.2|
|
22 |
-
|
23 |
-
### Model Details
|
24 |
-
- Model size: 7B
|
25 |
-
- Context length: 1024
|
26 |
-
- Language: Japanese
|
27 |
-
|
28 |
-
#### Training Procudure
|
29 |
-
- learning_rate: 5e-6
|
30 |
-
- train_batch_size: 4
|
31 |
-
- eval_batch_size: 2
|
32 |
-
- seed: 42
|
33 |
-
- gradient_accumulation_steps: 4
|
34 |
-
- lr_scheduler_type: cosine
|
35 |
-
- num_epochs: 1.0
|
36 |
-
#### Training Results
|
37 |
-
- Loss: 0.226000
|
|
|
6 |
<!-- Provide a quick summary of what the model is/does. -->
|
7 |
|
8 |
|
9 |
+
# Qwen2.5-7B-Instruct-dpo
|
10 |
|
11 |
+
## データセット作成
|
12 |
+
学習には、LMSYS-Chat-1MやOASST2などの既存のデータセットを日本語に翻訳したものを使用しました。まず、各データセットのプロンプトをGPT-4o miniで日本語に翻訳とその改善を行いました。その後、それらのプロンプトに対するQwen2.5-7B-InstructとGPT-4o miniによる応答をGPT-4o miniで評価し、選好データセットを作成しました。
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|