Michielo commited on
Commit
ab9c6b2
·
verified ·
1 Parent(s): 8d7053f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +13 -1
README.md CHANGED
@@ -27,7 +27,7 @@ tags:
27
 
28
  ## Model Summary
29
 
30
- **SmolLM2-135M-Humanized** is a fine-tuned version of the [SmolLM2-135M-Instruct](https://huggingface.co/HuggingFaceTB/SmolLM2-135M-Instruct) model, optimized using the Direct Preference Optimization (DPO) method. To do this we used the "[Human-Like-DPO-Dataset](https://huggingface.co/datasets/HumanLLMs/Human-Like-DPO-Dataset)" from [Human-Like LLMs](https://huggingface.co/HumanLLMs). To not lose too much quality with this post-training, we also applied some extra training on the ["openbmb/UltraFeedback"](https://huggingface.co/datasets/openbmb/UltraFeedback) dataset.
31
 
32
  Unlike traditional fine-tuning datasets that aim to improve specific benchmarks or metrics, the Human-Like-DPO-Dataset focuses on aligning the model's behavior with human preferences. This process enhances the model's ability to generate more natural, human-like responses, making it particularly well-suited for conversational applications.
33
 
@@ -118,4 +118,16 @@ Human-Like-DPO-Dataset:
118
  primaryClass={cs.CL},
119
  url={https://arxiv.org/abs/2501.05032},
120
  }
 
 
 
 
 
 
 
 
 
 
 
 
121
  ```
 
27
 
28
  ## Model Summary
29
 
30
+ **SmolLM2-135M-Humanized** is a fine-tuned version of the [SmolLM2-135M-Instruct](https://huggingface.co/HuggingFaceTB/SmolLM2-135M-Instruct) model, optimized using the Direct Preference Optimization (DPO) method. To do this we used the "[Human-Like-DPO-Dataset](https://huggingface.co/datasets/HumanLLMs/Human-Like-DPO-Dataset)" from [Human-Like LLMs](https://huggingface.co/HumanLLMs). To not lose too much quality with this post-training, we also applied some extra training on the "[openbmb/UltraFeedback](https://huggingface.co/datasets/openbmb/UltraFeedback)" dataset.
31
 
32
  Unlike traditional fine-tuning datasets that aim to improve specific benchmarks or metrics, the Human-Like-DPO-Dataset focuses on aligning the model's behavior with human preferences. This process enhances the model's ability to generate more natural, human-like responses, making it particularly well-suited for conversational applications.
33
 
 
118
  primaryClass={cs.CL},
119
  url={https://arxiv.org/abs/2501.05032},
120
  }
121
+ ```
122
+
123
+ UltraFeedback dataset:
124
+ ```bash
125
+ @misc{cui2023ultrafeedback,
126
+ title={UltraFeedback: Boosting Language Models with High-quality Feedback},
127
+ author={Ganqu Cui and Lifan Yuan and Ning Ding and Guanming Yao and Wei Zhu and Yuan Ni and Guotong Xie and Zhiyuan Liu and Maosong Sun},
128
+ year={2023},
129
+ eprint={2310.01377},
130
+ archivePrefix={arXiv},
131
+ primaryClass={cs.CL}
132
+ }
133
  ```