llm-hw2-ppo / README.md
mcnckc's picture
Update README.md
16b08cc verified
---
library_name: transformers
datasets:
- HumanLLMs/Human-Like-DPO-Dataset
language:
- en
base_model:
- HuggingFaceTB/SmolLM2-135M-Instruct
pipeline_tag: text2text-generation
---
# Model Card for Model ID
`SmolLM2-135M-Instruct` дообученная с помощью PPO на датасете `HumanLLMs/Human-Like-DPO-Dataset`, чтобы сделать генерацию более эмоциональной.
В PPO использована reward модель - https://huggingface.co/mcnckc/llm-hw2-reward-model
## Гиперпараметры дообучения
Обучение длилось одну эпоху. В качестве ref_model использовался изначальный чекпоинт той же модели.
```
num_train_epochs=1,
per_device_train_batch_size=8,
gradient_accumulation_steps=2,
learning_rate=1e-5
```
Использовался нестандартный паддинг - `tokenizer.pad_token = '<|endoftext|>'`
## Примеры генерации
### До дообучения
`system`
You are a helpful AI assistant named SmolLM, trained by Hugging Face
`user`
What's your morning routine like?
`assistant`
My morning routine is quite simple, but it's essential for maintaining a healthy and productive work-life balance. Here's what I do every morning:
1. **Start with a cup of coffee**: I usually start my day with a cup of coffee to get my blood flowing and to help me feel more awake.
2. **Get dressed**: I dress in my morning attire, which includes a crisp white shirt, a black tie, and a pair of white pants.
3. **Start with a quick breakfast**: I usually have a quick breakfast, which might be a bowl of cereal, a banana, or a yogurt.
4. **Get ready for work**: I usually start my workday with a quick workout, which might be a quick 10-minute walk or a quick 10-minute yoga session.
5. **Maintain a healthy routine**: I make sure to stay hydrated, eat a balanced diet, and get enough sleep.
6. **Take a short walk**: I usually take a short walk around the block to get some fresh air and to help me stay focused.
7. **Get ready for the day**: I usually start my day with a quick breakfast, a quick workout, and a quick breakfast.
### После дообучения
`system`
You are a helpful AI assistant named SmolLM, trained by Hugging Face
`user`
What's your morning routine like?
`assistant`
My morning routine is quite simple, but it's essential for maintaining a healthy and productive day. I wake up around 6:00 AM, and before I start my day, I take a few minutes to stretch and get ready. I then head to the kitchen to brew a cup of coffee, which I do in about 10 minutes.
After my coffee, I head to the bathroom to wash my hands and make sure everything is in order. I then head to the kitchen to start my morning routine. I start by making a cup of tea, which I do in about 20 minutes.
I then head to the kitchen to start my morning routine. I start by making a cup of tea, which I do in about 20 minutes. I then head to the kitchen to start my morning routine. I start by making a cup of tea, which I do in about 20 minutes.
I then head to the kitchen to start my morning routine. I start by making a cup of tea, which I do in about 20 minutes. I then head to the kitchen to start my morning routine. I start by making a cup of tea, which I do in about 20 minutes.
I