File size: 5,437 Bytes
090bed7
4adb4e3
 
 
caf3580
090bed7
caf3580
 
090bed7
 
caf3580
 
 
 
 
090bed7
caf3580
 
 
 
 
 
 
 
 
 
 
 
090bed7
2910e15
 
 
 
 
 
77baa3a
2910e15
 
 
 
 
 
 
77baa3a
2910e15
 
 
 
 
 
 
 
77baa3a
2910e15
 
caf3580
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
090bed7
 
 
caf3580
090bed7
 
 
 
caf3580
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
---
base_model: 
- unsloth/llama-2-7b-bnb-4bit
- hermeschen1116/response_generator_for_emotion_chat_bot
library_name: peft
license: apache-2.0
datasets:
- Shotaro30678/rlhf-RG-trl-style-v3
tags:
- trl
- unsloth
language:
- en
pipeline_tag: text-generation

---
# Response Generator for [Emotion Chat Bot](https://github.com/hermeschen1116/chat-bot)


## Model description

This model is a dpo fine-tuned version of [hermeschen1116/response_generator_for_emotion_chat_bot](https://huggingface.co/hermeschen1116/response_generator_for_emotion_chat_bot) on [Shotaro30678/rlhf-RG-trl-style-v3](https://huggingface.co/datasets/Shotaro30678/rlhf-RG-trl-style-v3), self modified version of [daily_dialog](li2017dailydialog/daily_dialog).

## Intended uses & limitations

Use dpo trainer to do the RLHF so that the model can be more precise and consistent.

## Model performance

### Model Comparison

**Sentiment Score:**
**[Shotaro30678/emotion_text_classifier_on_dd_v1](https://huggingface.co/Shotaro30678/emotion_text_classifier_on_dd_v1)**

| **Metric**   | **DPO Trained Model** | **SFT Model (Reference)** |
|--------------|:----------------------:|:--------------------------:|
| **Accuracy** | 0.851                 | 0.788                     |
| **F1-score** | 0.8564                | 0.7975                    |

**Gibberish Distribution:**
**[madhurjindal/autonlp-Gibberish-Detector-492513457](https://huggingface.co/madhurjindal/autonlp-Gibberish-Detector-492513457)**

| **Category**        | **DPO Trained Model** | **SFT Model (Reference)** |
|---------------------|:----------------------:|:--------------------------:|
| **Clean**           | 882                   | 898                       |
| **Mild Gibberish**  | 94                    | 58                        |
| **Word Salad**      | 21                    | 33                        |
| **Noise**           | 3                     | 11                        |

**Cut-Off Output:**

| **Output Type**     | **DPO Trained Model** | **SFT Model (Reference)** |
|---------------------|:----------------------:|:--------------------------:|
| **Complete Output** | 985                   | 975                       |
| **Incomplete Output** | 15                  | 25                        |

on [hermeschen1116/daily_dialog_for_RG](https://huggingface.co/datasets/hermeschen1116/daily_dialog_for_RG) test split.

**test on config:**
```python
  generation_config = GenerationConfig(
      max_new_tokens=150,
      min_new_tokens=5,
      repetition_penalty=1.1,
      top_k=3,
      top_p=0.9,
      pad_token_id=tokenizer.pad_token_id,
      eos_token_id=tokenizer.eos_token_id,
      temperature=1.0,
      do_sample=True,
      num_beams=1
  )
```
## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- beta=0.1,
- remove_unused_columns=False,
- num_train_epochs=3,
- gradient_checkpointing=True

others remain default

### Framework versions

- Bitsandbytes 0.43.1
- Datasets 2.20.0
- PEFT 0.11.1
- Pytorch 2.3.0+cu121
- Transformers 4.42.4
- Tokenizers 0.19.1
- Trl 0.8.6
- unsloth 2024.7 0f2e484
  
# Uploaded  model

- **Developed by:** Shotaro30678
- **Finetuned from model :** hermeschen1116/response_generator_for_emotion_chat_bot

This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.

[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)

# Quick sample
```python
  # libs are from github repo
  from libs import ResponseGeneratorPipeline
  from unsloth import FastLanguageModel
  model, tokenizer = FastLanguageModel.from_pretrained(
      model_name = "Shotaro30678/response_generator_DPO", # YOUR MODEL YOU USED FOR TRAINING
      load_in_4bit = True,
  )
  FastLanguageModel.for_inference(model) # Enable native 2x faster inference
  
  bot = ResponseGeneratorPipeline(
      model,
      tokenizer,
      framework="pt",
      task="conversation-generation",
      num_workers=16,
      torch_dtype="auto",
      add_special_tokens=True,
      truncation=False,
      padding=True
  )
  
  conversation = [
      {'content': {'dialog': '', 'emotion': ''}, 'role': 'system'},
      {'content': {'dialog': 'Can you do push-ups ?', 'emotion': 'neutral'},
      'role': 'user'},
      {'content': {'dialog': "Of course I can . It's a piece of cake ! Believe it or not , I can do 30 push-ups a minute .",
      'emotion': 'neutral'},
      'role': 'assistant'},
      {'content': {'dialog': "Really ? I think that's impossible !",
      'emotion': 'surprise'},
      'role': 'user'},
      {'content': {'dialog': 'You mean 30 push-ups ?', 'emotion': 'neutral'},
      'role': 'assistant'},
      {'content': {'dialog': 'Yeah !', 'emotion': 'neutral'}, 'role': 'user'},
      {'content': {'dialog': '', 'emotion': 'neutral'}, 'role': 'assistant'}
   ]
  
  generation_config = GenerationConfig(
      max_new_tokens=150,
      min_new_tokens=5,
      repetition_penalty=1.1,
      top_k=3,
      top_p=0.9,
      pad_token_id=tokenizer.pad_token_id,
      eos_token_id=tokenizer.eos_token_id,
      temperature=1.0,
      do_sample=True,
      num_beams=1
  )
  
  print(bot(conversation, generation_config=generation_config)[0]['generated_text'][-1]["content"]["dialog"])
```
**output:**
```
30 push-ups in a row? 
```