metadata
license: cc-by-4.0
base_model: davidkim205/komt-solar-10.7b-sft-v5
tags:
- trl
- dpo
- generated_from_trainer
model-index:
- name: nhn_dpo_v3_komt-solar-10.7b-sft-v5_DPO
results: []
ENERGY-DRINK-LOVE/eeve_dpo-v3
Our Team
- Youjin Chung
- Jingyeom Kim
Model
Base Model
Hardware and Software
- Hardware: A100 * 8 for training our model
- Deepspeed library & Huggingface TRL Trainer
Dataset
- DPO_dataset
- ์์ฒด ์ ์ dpo dataset(AI-hub dataset ํ์ฉ)
- OpenOrca DPO ๋ฑ ์์ด ๋ฐ์ดํฐ์ ๋ฒ์ญ(ENERGY-DRINK-LOVE/translate_share_gpt_dedup_llama_SFT_1024, ์์ฒด๋ชจ๋ธ ํ์ฉ)
Training Method
Benchmark
Average | Ko-ARC | Ko-HellaSwag | Ko-MMLU | Ko-TruthfulQA | Ko-CommonGen V2 |
---|---|---|---|---|---|
61.20 | 57.51 | 70.33 | 53.34 | 68.49 | 56.32 |