junsashihara commited on
Commit
d36055f
·
verified ·
1 Parent(s): 2c4f3f2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -28
README.md CHANGED
@@ -6,32 +6,7 @@ tags: []
6
  <!-- Provide a quick summary of what the model is/does. -->
7
 
8
 
9
- ## Qwen2.5-7B-Instruct-dpo
10
 
11
- ### Model Description
12
-
13
- <!-- Provide a longer summary of what this model is. -->
14
-
15
- Qwen2.5-7B-Instruct-preference is a fine-tuned model based on [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct). This model is fine-tuned on [original dataset](lightblue/response-dataset-plus-qwen-judged). The fine-tuned were carried out at a 1024 context length.
16
-
17
- ### Benchmarking
18
- The benchmark score is obtained using [arena-hard-auto-multilingual](https://github.com/lightblue-tech/arena-hard-auto-multilingual).
19
- |Qwen2.5-7B-Instruct|Ours|
20
- |----|----|
21
- |50.0|66.2|
22
-
23
- ### Model Details
24
- - Model size: 7B
25
- - Context length: 1024
26
- - Language: Japanese
27
-
28
- #### Training Procudure
29
- - learning_rate: 5e-6
30
- - train_batch_size: 4
31
- - eval_batch_size: 2
32
- - seed: 42
33
- - gradient_accumulation_steps: 4
34
- - lr_scheduler_type: cosine
35
- - num_epochs: 1.0
36
- #### Training Results
37
- - Loss: 0.226000
 
6
  <!-- Provide a quick summary of what the model is/does. -->
7
 
8
 
9
+ # Qwen2.5-7B-Instruct-dpo
10
 
11
+ ## データセット作成
12
+ 学習には、LMSYS-Chat-1MやOASST2などの既存のデータセットを日本語に翻訳したものを使用しました。まず、各データセットのプロンプトをGPT-4o miniで日本語に翻訳とその改善を行いました。その後、それらのプロンプトに対するQwen2.5-7B-InstructとGPT-4o miniによる応答をGPT-4o miniで評価し、選好データセットを作成しました。