|
--- |
|
library_name: transformers |
|
datasets: |
|
- HumanLLMs/Human-Like-DPO-Dataset |
|
language: |
|
- en |
|
base_model: |
|
- HuggingFaceTB/SmolLM2-135M-Instruct |
|
pipeline_tag: text2text-generation |
|
--- |
|
|
|
# Model Card for Model ID |
|
|
|
`SmolLM2-135M-Instruct` дообученная с помощью PPO на датасете `HumanLLMs/Human-Like-DPO-Dataset`, чтобы сделать генерацию более эмоциональной. |
|
|
|
В PPO использована reward модель - https://huggingface.co/mcnckc/llm-hw2-reward-model |
|
|
|
|
|
## Гиперпараметры дообучения |
|
|
|
Обучение длилось одну эпоху. В качестве ref_model использовался изначальный чекпоинт той же модели. |
|
``` |
|
num_train_epochs=1, |
|
per_device_train_batch_size=8, |
|
gradient_accumulation_steps=2, |
|
learning_rate=1e-5 |
|
``` |
|
Использовался нестандартный паддинг - `tokenizer.pad_token = '<|endoftext|>'` |
|
## Примеры генерации |
|
|
|
### До дообучения |
|
|
|
`system` |
|
|
|
You are a helpful AI assistant named SmolLM, trained by Hugging Face |
|
|
|
`user` |
|
|
|
What's your morning routine like? |
|
|
|
`assistant` |
|
|
|
My morning routine is quite simple, but it's essential for maintaining a healthy and productive work-life balance. Here's what I do every morning: |
|
|
|
1. **Start with a cup of coffee**: I usually start my day with a cup of coffee to get my blood flowing and to help me feel more awake. |
|
|
|
2. **Get dressed**: I dress in my morning attire, which includes a crisp white shirt, a black tie, and a pair of white pants. |
|
|
|
3. **Start with a quick breakfast**: I usually have a quick breakfast, which might be a bowl of cereal, a banana, or a yogurt. |
|
|
|
4. **Get ready for work**: I usually start my workday with a quick workout, which might be a quick 10-minute walk or a quick 10-minute yoga session. |
|
|
|
5. **Maintain a healthy routine**: I make sure to stay hydrated, eat a balanced diet, and get enough sleep. |
|
|
|
6. **Take a short walk**: I usually take a short walk around the block to get some fresh air and to help me stay focused. |
|
|
|
7. **Get ready for the day**: I usually start my day with a quick breakfast, a quick workout, and a quick breakfast. |
|
|
|
### После дообучения |
|
|
|
`system` |
|
|
|
You are a helpful AI assistant named SmolLM, trained by Hugging Face |
|
|
|
`user` |
|
|
|
What's your morning routine like? |
|
|
|
`assistant` |
|
|
|
My morning routine is quite simple, but it's essential for maintaining a healthy and productive day. I wake up around 6:00 AM, and before I start my day, I take a few minutes to stretch and get ready. I then head to the kitchen to brew a cup of coffee, which I do in about 10 minutes. |
|
|
|
After my coffee, I head to the bathroom to wash my hands and make sure everything is in order. I then head to the kitchen to start my morning routine. I start by making a cup of tea, which I do in about 20 minutes. |
|
|
|
I then head to the kitchen to start my morning routine. I start by making a cup of tea, which I do in about 20 minutes. I then head to the kitchen to start my morning routine. I start by making a cup of tea, which I do in about 20 minutes. |
|
|
|
I then head to the kitchen to start my morning routine. I start by making a cup of tea, which I do in about 20 minutes. I then head to the kitchen to start my morning routine. I start by making a cup of tea, which I do in about 20 minutes. |
|
|
|
I |