junsashihara
/

Qwen2.5-7B-Instruct-dpo

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

junsashihara commited on Jan 7

Commit

d36055f

·

verified ·

1 Parent(s): 2c4f3f2

Update README.md

Files changed (1) hide show

README.md +3 -28

README.md CHANGED Viewed

@@ -6,32 +6,7 @@ tags: []
 <!-- Provide a quick summary of what the model is/does. -->
-## Qwen2.5-7B-Instruct-dpo
-### Model Description
-<!-- Provide a longer summary of what this model is. -->
-Qwen2.5-7B-Instruct-preference is a fine-tuned model based on [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct). This model is fine-tuned on [original dataset](lightblue/response-dataset-plus-qwen-judged). The fine-tuned were carried out at a 1024 context length.
-### Benchmarking
-The benchmark score is obtained using [arena-hard-auto-multilingual](https://github.com/lightblue-tech/arena-hard-auto-multilingual).
-|Qwen2.5-7B-Instruct|Ours|
-|----|----|
-|50.0|66.2|
-### Model Details
-- Model size: 7B
-- Context length: 1024
-- Language: Japanese
-#### Training Procudure
-- learning_rate: 5e-6
-- train_batch_size: 4
-- eval_batch_size: 2
-- seed: 42
-- gradient_accumulation_steps: 4
-- lr_scheduler_type: cosine
-- num_epochs: 1.0
-#### Training Results
-- Loss: 0.226000

 <!-- Provide a quick summary of what the model is/does. -->
+# Qwen2.5-7B-Instruct-dpo
+## データセット作成
+学習には、LMSYS-Chat-1MやOASST2などの既存のデータセットを日本語に翻訳したものを使用しました。まず、各データセットのプロンプトをGPT-4o miniで日本語に翻訳とその改善を行いました。その後、それらのプロンプトに対するQwen2.5-7B-InstructとGPT-4o miniによる応答をGPT-4o miniで評価し、選好データセットを作成しました。