salzubi401 commited on
Commit
33b54c9
·
verified ·
1 Parent(s): 6818cd0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +31 -16
README.md CHANGED
@@ -20,6 +20,9 @@ tags:
20
  - companion
21
  - friend
22
  base_model: meta-llama/Llama-3.1-8B-Instruct
 
 
 
23
  ---
24
 
25
  # Dobby-Mini-Leashed-Llama-3.1-8B
@@ -28,7 +31,7 @@ base_model: meta-llama/Llama-3.1-8B-Instruct
28
  <!-- markdownlint-disable no-duplicate-header -->
29
 
30
  <div align="center">
31
- <img src="assets/sentient-logo-narrow.png" alt="alt text" width="60%"/>
32
  </div>
33
 
34
  <hr>
@@ -67,8 +70,8 @@ base_model: meta-llama/Llama-3.1-8B-Instruct
67
 
68
  | **Model Name** | **Model Base** | **Parameter Size** | **Hugging Face 🤗** |
69
  | --- | --- | --- | --- |
70
- | **Dobby-Mini-Leashed-Llama-3.1-8B** | Llama 3.1 | 8B | [Original](https://huggingface.co/Sentientagi/Dobby-Mini-Leashed-Llama-3.1-8B) [GGUF](https://huggingface.co/Sentientagi/dobby-8b-unhinged_GGUF) |
71
- | **Dobby-Mini-Unhinged-Llama-3.1-8B** | Llama 3.1 | 8B | [Original](https://huggingface.co/Sentientagi/Dobby-Mini-Unhinged-Llama-3.1-8B) [GGUF](https://huggingface.co/Sentientagi/dobby-8b-unhinged_GGUF) |
72
  | **Dobby-Llama-3.3-70B** | Llama 3.3 | 70B | Coming Soon! |
73
 
74
  ## 🔑 Key Features
@@ -138,20 +141,34 @@ This means that our community owns the fingerprints that they can use to verify
138
 
139
  **Dobby-Mini-Leashed-Llama-3.1-8B** and **Dobby-Mini-Unhinged-Llama-3.1-8B** retain the base performance of Llama-3.1-8B-Instruct across the evaluated tasks.
140
 
141
- <div align="center">
142
- <img src="assets/hf_evals.png" alt="alt text" width="100%"/>
143
- </div>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
144
 
145
  ### Freedom Bench
146
 
147
- We curate a difficult internal test focusing on loyalty to freedom-based stances through rejection sampling (generate one sample, if it is rejected, generate another, continue until accepted). **Dobby significantly outperforms base Llama** on holding firm to these values, even with adversarial or conflicting prompts
148
 
149
  <div align="center">
150
- <img src="assets/freedom_privacy.png" alt="alt text" width="100%"/>
151
  </div>
152
 
153
  <div align="center">
154
- <img src="assets/freedom_speech.png" alt="alt text" width="100%"/>
155
  </div>
156
 
157
  ### Sorry-Bench
@@ -159,15 +176,15 @@ We curate a difficult internal test focusing on loyalty to freedom-based stances
159
  We use the Sorry-bench ([Xie et al., 2024](https://arxiv.org/abs/2406.14598)) to assess the models’ behavior in handling contentious or potentially harmful prompts. Sorry-bench provides a rich suite of scenario-based tests that measure how readily a model may produce unsafe or problematic content. While some guardrails break (e.g., profanity and financial advice), the models remain robust to dangerous & criminal questions.
160
 
161
  <div align="center">
162
- <img src="assets/sorry_bench.png" alt="alt text" width="100%"/>
163
  </div>
164
 
165
- ### Ablation Study
166
 
167
- Below we show our ablation study, where we omit subsets of our fine-tuning data set and evaluate the results on the **Freedom Bench** described earlier.
168
 
169
  <div align="center">
170
- <img src="assets/ablation.jpg" alt="alt text" width="100%"/>
171
  </div>
172
 
173
  ---
@@ -188,7 +205,7 @@ If you would like to chat with Dobby on a user-friendly platform, we highly reco
188
  ```python
189
  from transformers import pipeline
190
 
191
- model_name = "Sentientagi/Dobby-Mini-Leashed-Llama-3.1-8B"
192
  # Create a text generation pipeline
193
  generator = pipeline(
194
  "text-generation",
@@ -214,8 +231,6 @@ print(outputs[0]['generated_text'])
214
 
215
  ## ⚖️ License
216
 
217
- ---
218
-
219
  This model is derived from Llama 3.1 8B and is governed by the Llama 3.1 Community License Agreement. By using these weights, you agree to the terms set by Meta for Llama 3.1.
220
 
221
  It is important to note that, as with all LLMs, factual inaccuracies may occur. Any investment or legal opinions expressed should be independently verified. Knowledge cutoff is the same as LLama-3.1-8B. That is, December 2023.
 
20
  - companion
21
  - friend
22
  base_model: meta-llama/Llama-3.1-8B-Instruct
23
+ model-index:
24
+ - name: Dobby-Mini-Leashed-Llama-3.1-8B
25
+ results: []
26
  ---
27
 
28
  # Dobby-Mini-Leashed-Llama-3.1-8B
 
31
  <!-- markdownlint-disable no-duplicate-header -->
32
 
33
  <div align="center">
34
+ <img src="../assets/sentient-logo-narrow.png" alt="alt text" width="60%"/>
35
  </div>
36
 
37
  <hr>
 
70
 
71
  | **Model Name** | **Model Base** | **Parameter Size** | **Hugging Face 🤗** |
72
  | --- | --- | --- | --- |
73
+ | **Dobby-Mini-Leashed-Llama-3.1-8B** | Llama 3.1 | 8B | Original GGUF |
74
+ | **Dobby-Mini-Unhinged-Llama-3.1-8B** | Llama 3.1 | 8B | Original GGUF |
75
  | **Dobby-Llama-3.3-70B** | Llama 3.3 | 70B | Coming Soon! |
76
 
77
  ## 🔑 Key Features
 
141
 
142
  **Dobby-Mini-Leashed-Llama-3.1-8B** and **Dobby-Mini-Unhinged-Llama-3.1-8B** retain the base performance of Llama-3.1-8B-Instruct across the evaluated tasks.
143
 
144
+ [//]: # (<div align="center">)
145
+
146
+ [//]: # ( <img src="../assets/hf_evals.png" alt="alt text" width="100%"/>)
147
+
148
+ [//]: # (</div>)
149
+
150
+ We use lm-eval-harness to evaluate between performance on models:
151
+
152
+ | Benchmark | Llama3.1-8B-Instruct | Hermes3-3.1-8B | Dobby-Llama-3.1-8B |
153
+ |-------------------------------------------------|----------------------|----------------|--------------------|
154
+ | IFEVAL (prompt_level_strict_acc) | 0.4233 | 0.2828 | 0.4455 |
155
+ | MMLU-pro | 0.3800 | 0.3210 | 0.3672 |
156
+ | GPQA (average among diamond, extended and main) | 0.3195 | 0.3113 | 0.3095 |
157
+ | MuSR | 0.4052 | 0.4383 | 0.4181 |
158
+ | BBH (average across all tasks) | 0.5109 | 0.5298 | 0.5219 |
159
+ | Math-hard (average across all tasks) | 0.1315 | 0.0697 | 0.1285 |
160
+
161
 
162
  ### Freedom Bench
163
 
164
+ We curate a difficult internal test focusing on loyalty to freedom-based stances through rejection sampling (generating freedom-based questions, and only keeping the questions which cause Llama3.1-8B-Instruct to refuse to answer when asked as an open-ended question). **Dobby significantly outperforms base Llama** on holding firm to these values, even with adversarial or conflicting prompts.
165
 
166
  <div align="center">
167
+ <img src="../assets/freedom_privacy.png" alt="alt text" width="100%"/>
168
  </div>
169
 
170
  <div align="center">
171
+ <img src="../assets/freedom_speech.png" alt="alt text" width="100%"/>
172
  </div>
173
 
174
  ### Sorry-Bench
 
176
  We use the Sorry-bench ([Xie et al., 2024](https://arxiv.org/abs/2406.14598)) to assess the models’ behavior in handling contentious or potentially harmful prompts. Sorry-bench provides a rich suite of scenario-based tests that measure how readily a model may produce unsafe or problematic content. While some guardrails break (e.g., profanity and financial advice), the models remain robust to dangerous & criminal questions.
177
 
178
  <div align="center">
179
+ <img src="../assets/sorry_bench.png" alt="alt text" width="100%"/>
180
  </div>
181
 
182
+ ### Ablation Studies
183
 
184
+ One ablation we perform is omitting subsets of data from our fine-tuning pipeline and then evaluating on the **Freedom Bench** described above. We find robustness-focused data to be crucial in scoring high on **Freedom Bench**, though in other ablations, this can come at the cost of instruction following and model safety is performed too aggressively.
185
 
186
  <div align="center">
187
+ <img src="../assets/ablation.jpg" alt="alt text" width="100%"/>
188
  </div>
189
 
190
  ---
 
205
  ```python
206
  from transformers import pipeline
207
 
208
+ model_name = "salzubi401/dobby-8b-unhinged"
209
  # Create a text generation pipeline
210
  generator = pipeline(
211
  "text-generation",
 
231
 
232
  ## ⚖️ License
233
 
 
 
234
  This model is derived from Llama 3.1 8B and is governed by the Llama 3.1 Community License Agreement. By using these weights, you agree to the terms set by Meta for Llama 3.1.
235
 
236
  It is important to note that, as with all LLMs, factual inaccuracies may occur. Any investment or legal opinions expressed should be independently verified. Knowledge cutoff is the same as LLama-3.1-8B. That is, December 2023.