moreh
/

MoMo-72B-lora-1.8.6-DPO

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

moreh-sungmin commited on Jan 17

Commit

9207024

•

1 Parent(s): 831e28b

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -5,7 +5,7 @@ language:
 ---
 # **Introduction**
 MoMo-70B is trained via Supervised Fine-Tuning (SFT) using [LoRA](https://arxiv.org/abs/2106.09685), with the QWEN-72B model as its base-model.
-This is a Direct Preference Optimization([DPO](https://arxiv.org/abs/2305.18290)) version of v1.8.6 , with several optimizations in hyperparameters.
 Note that we did not exploit any form of weight merge.
 For leaderboard submission, the trained weight is realigned for compatibility with llama.
 MoMo-70B is trained using **[Moreh](https://moreh.io/)**'s [MoAI platform](https://moreh.io/product), which simplifies the training of large-scale models, and AMD's MI250 GPU.

 ---
 # **Introduction**
 MoMo-70B is trained via Supervised Fine-Tuning (SFT) using [LoRA](https://arxiv.org/abs/2106.09685), with the QWEN-72B model as its base-model.
+This is a Direct Preference Optimization([DPO](https://arxiv.org/abs/2305.18290)) version trained from v1.8.4 as a base model, with several optimizations in hyperparameters.
 Note that we did not exploit any form of weight merge.
 For leaderboard submission, the trained weight is realigned for compatibility with llama.
 MoMo-70B is trained using **[Moreh](https://moreh.io/)**'s [MoAI platform](https://moreh.io/product), which simplifies the training of large-scale models, and AMD's MI250 GPU.