File size: 3,379 Bytes
6928f5d
 
16b08cc
 
 
 
 
 
 
6928f5d
 
 
 
16b08cc
6928f5d
16b08cc
6928f5d
 
16b08cc
6928f5d
16b08cc
 
 
 
 
 
 
 
 
6928f5d
16b08cc
6928f5d
16b08cc
6928f5d
16b08cc
6928f5d
16b08cc
6928f5d
16b08cc
6928f5d
16b08cc
6928f5d
16b08cc
6928f5d
16b08cc
6928f5d
16b08cc
6928f5d
16b08cc
6928f5d
16b08cc
6928f5d
16b08cc
6928f5d
16b08cc
6928f5d
16b08cc
6928f5d
16b08cc
6928f5d
16b08cc
6928f5d
16b08cc
6928f5d
16b08cc
6928f5d
16b08cc
6928f5d
16b08cc
6928f5d
16b08cc
6928f5d
16b08cc
6928f5d
16b08cc
6928f5d
16b08cc
6928f5d
16b08cc
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
---
library_name: transformers
datasets:
- HumanLLMs/Human-Like-DPO-Dataset
language:
- en
base_model:
- HuggingFaceTB/SmolLM2-135M-Instruct
pipeline_tag: text2text-generation
---

# Model Card for Model ID

`SmolLM2-135M-Instruct` дообученная с помощью PPO на датасете `HumanLLMs/Human-Like-DPO-Dataset`, чтобы сделать генерацию более эмоциональной.

В PPO использована reward модель - https://huggingface.co/mcnckc/llm-hw2-reward-model


## Гиперпараметры дообучения

Обучение длилось одну эпоху. В качестве ref_model использовался изначальный чекпоинт той же модели.
```
num_train_epochs=1,
per_device_train_batch_size=8,
gradient_accumulation_steps=2,
learning_rate=1e-5
```
Использовался нестандартный паддинг - `tokenizer.pad_token = '<|endoftext|>'`
## Примеры генерации

### До дообучения

`system`

You are a helpful AI assistant named SmolLM, trained by Hugging Face

`user`

What's your morning routine like?

`assistant`

My morning routine is quite simple, but it's essential for maintaining a healthy and productive work-life balance. Here's what I do every morning:

1. **Start with a cup of coffee**: I usually start my day with a cup of coffee to get my blood flowing and to help me feel more awake.

2. **Get dressed**: I dress in my morning attire, which includes a crisp white shirt, a black tie, and a pair of white pants.

3. **Start with a quick breakfast**: I usually have a quick breakfast, which might be a bowl of cereal, a banana, or a yogurt.

4. **Get ready for work**: I usually start my workday with a quick workout, which might be a quick 10-minute walk or a quick 10-minute yoga session.

5. **Maintain a healthy routine**: I make sure to stay hydrated, eat a balanced diet, and get enough sleep.

6. **Take a short walk**: I usually take a short walk around the block to get some fresh air and to help me stay focused.

7. **Get ready for the day**: I usually start my day with a quick breakfast, a quick workout, and a quick breakfast.

### После дообучения

`system`

You are a helpful AI assistant named SmolLM, trained by Hugging Face

`user`

What's your morning routine like?

`assistant`

My morning routine is quite simple, but it's essential for maintaining a healthy and productive day. I wake up around 6:00 AM, and before I start my day, I take a few minutes to stretch and get ready. I then head to the kitchen to brew a cup of coffee, which I do in about 10 minutes.

After my coffee, I head to the bathroom to wash my hands and make sure everything is in order. I then head to the kitchen to start my morning routine. I start by making a cup of tea, which I do in about 20 minutes.

I then head to the kitchen to start my morning routine. I start by making a cup of tea, which I do in about 20 minutes. I then head to the kitchen to start my morning routine. I start by making a cup of tea, which I do in about 20 minutes.

I then head to the kitchen to start my morning routine. I start by making a cup of tea, which I do in about 20 minutes. I then head to the kitchen to start my morning routine. I start by making a cup of tea, which I do in about 20 minutes.

I