Michielo commited on
Commit
8d7053f
·
verified ·
1 Parent(s): d51d69c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +12 -12
README.md CHANGED
@@ -27,7 +27,7 @@ tags:
27
 
28
  ## Model Summary
29
 
30
- **SmolLM2-135M-Humanized** is a fine-tuned version of the [SmolLM2-135M-Instruct](https://huggingface.co/HuggingFaceTB/SmolLM2-135M-Instruct) model, optimized using the Direct Preference Optimization (DPO) method. To do this we used the "[Human-Like-DPO-Dataset](https://huggingface.co/datasets/HumanLLMs/Human-Like-DPO-Dataset)" from [Human-Like LLMs](https://huggingface.co/HumanLLMs).
31
 
32
  Unlike traditional fine-tuning datasets that aim to improve specific benchmarks or metrics, the Human-Like-DPO-Dataset focuses on aligning the model's behavior with human preferences. This process enhances the model's ability to generate more natural, human-like responses, making it particularly well-suited for conversational applications.
33
 
@@ -73,17 +73,17 @@ In this section, we report the evaluation results of SmolLM2. All evaluations ar
73
 
74
  | Metric | SmolLM2-135M-Instruct | SmolLM2-135M-Humanized | Difference |
75
  |:-----------------------------|:---------------------:|:----------------------:|:----------:|
76
- | MMLU | **23.1** | **23.1** | 0.0 |
77
- | ARC (Easy) | **54.3** | 50.2 | -4.1 |
78
- | ARC (Challenge) | **26.1** | 25.3 | -0.8 |
79
- | HellaSwag | **43.0** | 41.6 | -1.4 |
80
- | PIQA | **67.2** | 66.2 | -1.0 |
81
- | WinoGrande | **52.5** | 52.2 | -0.3 |
82
- | TriviaQA | **0.3** | 0.1 | -0.2 |
83
- | GSM8K | 0.2 | **0.5** | +0.3 |
84
- | OpenBookQA | **32.6** | 32.0 | -0.6 |
85
- | CommonSenseQA | **4.8** | 2.2 | -2.6 |
86
- | QuAC (F1) | **14.1** | 11.0 | -3.1 |
87
 
88
 
89
  ## Limitations
 
27
 
28
  ## Model Summary
29
 
30
+ **SmolLM2-135M-Humanized** is a fine-tuned version of the [SmolLM2-135M-Instruct](https://huggingface.co/HuggingFaceTB/SmolLM2-135M-Instruct) model, optimized using the Direct Preference Optimization (DPO) method. To do this we used the "[Human-Like-DPO-Dataset](https://huggingface.co/datasets/HumanLLMs/Human-Like-DPO-Dataset)" from [Human-Like LLMs](https://huggingface.co/HumanLLMs). To not lose too much quality with this post-training, we also applied some extra training on the ["openbmb/UltraFeedback"](https://huggingface.co/datasets/openbmb/UltraFeedback) dataset.
31
 
32
  Unlike traditional fine-tuning datasets that aim to improve specific benchmarks or metrics, the Human-Like-DPO-Dataset focuses on aligning the model's behavior with human preferences. This process enhances the model's ability to generate more natural, human-like responses, making it particularly well-suited for conversational applications.
33
 
 
73
 
74
  | Metric | SmolLM2-135M-Instruct | SmolLM2-135M-Humanized | Difference |
75
  |:-----------------------------|:---------------------:|:----------------------:|:----------:|
76
+ | MMLU | **23.1** | 23.0 | -0.1 |
77
+ | ARC (Easy) | 54.3 | **55.0** | +0.7 |
78
+ | ARC (Challenge) | **26.1** | 25.5 | -0.6 |
79
+ | HellaSwag | **43.0** | 42.4 | -0.6 |
80
+ | PIQA | **67.2** | 67.0 | -0.2 |
81
+ | WinoGrande | **52.5** | 52.1 | -0.4 |
82
+ | TriviaQA | **0.3** | 0.2 | -0.1 |
83
+ | GSM8K | 0.2 | **0.8** | +0.6 |
84
+ | OpenBookQA | 32.6 | **33.0** | +0.4 |
85
+ | QuAC (F1) | **14.1** | 13.2 | -0.9 |
86
+
87
 
88
 
89
  ## Limitations