ENERGY-DRINK-LOVE
/

komt_DPOv3

Text Generation

Generated from Trainer

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

jingyeom commited on Mar 16

Commit

4577d50

•

1 Parent(s): 5ae1a12

Update README.md

Files changed (1) hide show

README.md +24 -31

README.md CHANGED Viewed

@@ -13,48 +13,41 @@ model-index:
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
-# nhn_dpo_v3_komt-solar-10.7b-sft-v5_DPO
-This model is a fine-tuned version of [davidkim205/komt-solar-10.7b-sft-v5](https://huggingface.co/davidkim205/komt-solar-10.7b-sft-v5) on an unknown dataset.
-## Model description
-More information needed
-## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed
-## Training procedure
-### Training hyperparameters
-The following hyperparameters were used during training:
-- learning_rate: 5e-07
-- train_batch_size: 1
-- eval_batch_size: 8
-- seed: 42
-- distributed_type: multi-GPU
-- num_devices: 7
-- gradient_accumulation_steps: 8
-- total_train_batch_size: 56
-- total_eval_batch_size: 56
-- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
-- lr_scheduler_type: cosine
-- lr_scheduler_warmup_ratio: 0.1
-- num_epochs: 1
-### Training results
-### Framework versions
-- Transformers 4.38.1
-- Pytorch 2.2.1+cu118
-- Datasets 2.17.1
-- Tokenizers 0.15.2

 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
+# ENERGY-DRINK-LOVE/eeve_dpo-v3
+### Our Team
+* Youjin Chung
+* Jingyeom Kim
+## Model
+### Base Model
+* [davidkim205/komt-solar-10.7b-sft-v5](https://huggingface.co/davidkim205/komt-solar-10.7b-sft-v5)
+### Hardware and Software
+* Hardware: A100 * 8 for training our model
+* Deepspeed library & Huggingface TRL Trainer
+### Dataset
+* DPO_dataset
+  * 자체 제작 dpo dataset(AI-hub dataset 활용)
+  * OpenOrca DPO 등 영어 데이터셋 번역(ENERGY-DRINK-LOVE/translate_share_gpt_dedup_llama_SFT_1024, 자체모델 활용)
+### Training Method
+* [DPO](https://arxiv.org/abs/2305.18290)
+## Benchmark
+**[Ko LM Eval Harness](https://github.com/Beomi/ko-lm-evaluation-harness)**
+**[Ko-LLM-Leaderboard](https://www.aihub.or.kr/leaderboard/view.do?currMenu=500&topMenu=102)**
+* (240316기준 4등)
+* ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6551c0e37bbfce18781a8748/tCSxiFXJkI3Pi2qAlVPh7.png)
+| Average | Ko-ARC | Ko-HellaSwag | Ko-MMLU | Ko-TruthfulQA | Ko-CommonGen V2 |
+| ------: | -----: | -----------: | ------: | ------------: | --------------: |
+|   61.20 |  57.51 |        70.33 |   53.34 |         68.49 |           56.32 |