AssistantsLab
/

SmolLM2-135M-humanized

@@ -27,7 +27,7 @@ tags:
 ## Model Summary
-**SmolLM2-135M-Humanized** is a fine-tuned version of the [SmolLM2-135M-Instruct](https://huggingface.co/HuggingFaceTB/SmolLM2-135M-Instruct) model, optimized using the Direct Preference Optimization (DPO) method. To do this we used the "[Human-Like-DPO-Dataset](https://huggingface.co/datasets/HumanLLMs/Human-Like-DPO-Dataset)" from [Human-Like LLMs](https://huggingface.co/HumanLLMs).
 Unlike traditional fine-tuning datasets that aim to improve specific benchmarks or metrics, the Human-Like-DPO-Dataset focuses on aligning the model's behavior with human preferences. This process enhances the model's ability to generate more natural, human-like responses, making it particularly well-suited for conversational applications.
@@ -73,17 +73,17 @@ In this section, we report the evaluation results of SmolLM2. All evaluations ar
 | Metric                       | SmolLM2-135M-Instruct | SmolLM2-135M-Humanized | Difference |
 |:-----------------------------|:---------------------:|:----------------------:|:----------:|
-| MMLU                         | **23.1**              | **23.1**               | 0.0        |
-| ARC (Easy)                   | **54.3**              | 50.2                   | -4.1       |
-| ARC (Challenge)              | **26.1**              | 25.3                   | -0.8       |
-| HellaSwag                    | **43.0**              | 41.6                   | -1.4       |
-| PIQA                         | **67.2**              | 66.2                   | -1.0       |
-| WinoGrande                   | **52.5**              | 52.2                   | -0.3       |
-| TriviaQA                     | **0.3**               | 0.1                    | -0.2       |
-| GSM8K                        | 0.2                   | **0.5**                | +0.3       |
-| OpenBookQA                   | **32.6**              | 32.0                   | -0.6       |
-| CommonSenseQA                | **4.8**               | 2.2                    | -2.6       |
-| QuAC (F1)                    | **14.1**              | 11.0                   | -3.1       |
 ## Limitations

 ## Model Summary
+**SmolLM2-135M-Humanized** is a fine-tuned version of the [SmolLM2-135M-Instruct](https://huggingface.co/HuggingFaceTB/SmolLM2-135M-Instruct) model, optimized using the Direct Preference Optimization (DPO) method. To do this we used the "[Human-Like-DPO-Dataset](https://huggingface.co/datasets/HumanLLMs/Human-Like-DPO-Dataset)" from [Human-Like LLMs](https://huggingface.co/HumanLLMs). To not lose too much quality with this post-training, we also applied some extra training on the ["openbmb/UltraFeedback"](https://huggingface.co/datasets/openbmb/UltraFeedback) dataset.
 Unlike traditional fine-tuning datasets that aim to improve specific benchmarks or metrics, the Human-Like-DPO-Dataset focuses on aligning the model's behavior with human preferences. This process enhances the model's ability to generate more natural, human-like responses, making it particularly well-suited for conversational applications.
 | Metric                       | SmolLM2-135M-Instruct | SmolLM2-135M-Humanized | Difference |
 |:-----------------------------|:---------------------:|:----------------------:|:----------:|
+| MMLU                         | **23.1**              | 23.0                   | -0.1       |
+| ARC (Easy)                   | 54.3                  | **55.0**               | +0.7       |
+| ARC (Challenge)              | **26.1**              | 25.5                   | -0.6       |
+| HellaSwag                    | **43.0**              | 42.4                   | -0.6       |
+| PIQA                         | **67.2**              | 67.0                   | -0.2       |
+| WinoGrande                   | **52.5**              | 52.1                   | -0.4       |
+| TriviaQA                     | **0.3**               | 0.2                    | -0.1       |
+| GSM8K                        | 0.2                   | **0.8**                | +0.6       |
+| OpenBookQA                   | 32.6                  | **33.0**               | +0.4       |
+| QuAC (F1)                    | **14.1**              | 13.2                   | -0.9       |
 ## Limitations