Update README.md
Browse files
README.md
CHANGED
@@ -27,7 +27,7 @@ tags:
|
|
27 |
|
28 |
## Model Summary
|
29 |
|
30 |
-
**SmolLM2-135M-Humanized** is a fine-tuned version of the [SmolLM2-135M-Instruct](https://huggingface.co/HuggingFaceTB/SmolLM2-135M-Instruct) model, optimized using the Direct Preference Optimization (DPO) method. To do this we used the "[Human-Like-DPO-Dataset](https://huggingface.co/datasets/HumanLLMs/Human-Like-DPO-Dataset)" from [Human-Like LLMs](https://huggingface.co/HumanLLMs). To not lose too much quality with this post-training, we also applied some extra training on the [
|
31 |
|
32 |
Unlike traditional fine-tuning datasets that aim to improve specific benchmarks or metrics, the Human-Like-DPO-Dataset focuses on aligning the model's behavior with human preferences. This process enhances the model's ability to generate more natural, human-like responses, making it particularly well-suited for conversational applications.
|
33 |
|
@@ -118,4 +118,16 @@ Human-Like-DPO-Dataset:
|
|
118 |
primaryClass={cs.CL},
|
119 |
url={https://arxiv.org/abs/2501.05032},
|
120 |
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
121 |
```
|
|
|
27 |
|
28 |
## Model Summary
|
29 |
|
30 |
+
**SmolLM2-135M-Humanized** is a fine-tuned version of the [SmolLM2-135M-Instruct](https://huggingface.co/HuggingFaceTB/SmolLM2-135M-Instruct) model, optimized using the Direct Preference Optimization (DPO) method. To do this we used the "[Human-Like-DPO-Dataset](https://huggingface.co/datasets/HumanLLMs/Human-Like-DPO-Dataset)" from [Human-Like LLMs](https://huggingface.co/HumanLLMs). To not lose too much quality with this post-training, we also applied some extra training on the "[openbmb/UltraFeedback](https://huggingface.co/datasets/openbmb/UltraFeedback)" dataset.
|
31 |
|
32 |
Unlike traditional fine-tuning datasets that aim to improve specific benchmarks or metrics, the Human-Like-DPO-Dataset focuses on aligning the model's behavior with human preferences. This process enhances the model's ability to generate more natural, human-like responses, making it particularly well-suited for conversational applications.
|
33 |
|
|
|
118 |
primaryClass={cs.CL},
|
119 |
url={https://arxiv.org/abs/2501.05032},
|
120 |
}
|
121 |
+
```
|
122 |
+
|
123 |
+
UltraFeedback dataset:
|
124 |
+
```bash
|
125 |
+
@misc{cui2023ultrafeedback,
|
126 |
+
title={UltraFeedback: Boosting Language Models with High-quality Feedback},
|
127 |
+
author={Ganqu Cui and Lifan Yuan and Ning Ding and Guanming Yao and Wei Zhu and Yuan Ni and Guotong Xie and Zhiyuan Liu and Maosong Sun},
|
128 |
+
year={2023},
|
129 |
+
eprint={2310.01377},
|
130 |
+
archivePrefix={arXiv},
|
131 |
+
primaryClass={cs.CL}
|
132 |
+
}
|
133 |
```
|