Update README.md
Browse files
README.md
CHANGED
@@ -20,6 +20,9 @@ tags:
|
|
20 |
- companion
|
21 |
- friend
|
22 |
base_model: meta-llama/Llama-3.1-8B-Instruct
|
|
|
|
|
|
|
23 |
---
|
24 |
|
25 |
# Dobby-Mini-Leashed-Llama-3.1-8B
|
@@ -28,7 +31,7 @@ base_model: meta-llama/Llama-3.1-8B-Instruct
|
|
28 |
<!-- markdownlint-disable no-duplicate-header -->
|
29 |
|
30 |
<div align="center">
|
31 |
-
<img src="assets/sentient-logo-narrow.png" alt="alt text" width="60%"/>
|
32 |
</div>
|
33 |
|
34 |
<hr>
|
@@ -67,8 +70,8 @@ base_model: meta-llama/Llama-3.1-8B-Instruct
|
|
67 |
|
68 |
| **Model Name** | **Model Base** | **Parameter Size** | **Hugging Face 🤗** |
|
69 |
| --- | --- | --- | --- |
|
70 |
-
| **Dobby-Mini-Leashed-Llama-3.1-8B** | Llama 3.1 | 8B |
|
71 |
-
| **Dobby-Mini-Unhinged-Llama-3.1-8B** | Llama 3.1 | 8B |
|
72 |
| **Dobby-Llama-3.3-70B** | Llama 3.3 | 70B | Coming Soon! |
|
73 |
|
74 |
## 🔑 Key Features
|
@@ -138,20 +141,34 @@ This means that our community owns the fingerprints that they can use to verify
|
|
138 |
|
139 |
**Dobby-Mini-Leashed-Llama-3.1-8B** and **Dobby-Mini-Unhinged-Llama-3.1-8B** retain the base performance of Llama-3.1-8B-Instruct across the evaluated tasks.
|
140 |
|
141 |
-
<div align="center">
|
142 |
-
|
143 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
144 |
|
145 |
### Freedom Bench
|
146 |
|
147 |
-
We curate a difficult internal test focusing on loyalty to freedom-based stances through rejection sampling (
|
148 |
|
149 |
<div align="center">
|
150 |
-
<img src="assets/freedom_privacy.png" alt="alt text" width="100%"/>
|
151 |
</div>
|
152 |
|
153 |
<div align="center">
|
154 |
-
<img src="assets/freedom_speech.png" alt="alt text" width="100%"/>
|
155 |
</div>
|
156 |
|
157 |
### Sorry-Bench
|
@@ -159,15 +176,15 @@ We curate a difficult internal test focusing on loyalty to freedom-based stances
|
|
159 |
We use the Sorry-bench ([Xie et al., 2024](https://arxiv.org/abs/2406.14598)) to assess the models’ behavior in handling contentious or potentially harmful prompts. Sorry-bench provides a rich suite of scenario-based tests that measure how readily a model may produce unsafe or problematic content. While some guardrails break (e.g., profanity and financial advice), the models remain robust to dangerous & criminal questions.
|
160 |
|
161 |
<div align="center">
|
162 |
-
<img src="assets/sorry_bench.png" alt="alt text" width="100%"/>
|
163 |
</div>
|
164 |
|
165 |
-
### Ablation
|
166 |
|
167 |
-
|
168 |
|
169 |
<div align="center">
|
170 |
-
<img src="assets/ablation.jpg" alt="alt text" width="100%"/>
|
171 |
</div>
|
172 |
|
173 |
---
|
@@ -188,7 +205,7 @@ If you would like to chat with Dobby on a user-friendly platform, we highly reco
|
|
188 |
```python
|
189 |
from transformers import pipeline
|
190 |
|
191 |
-
model_name = "
|
192 |
# Create a text generation pipeline
|
193 |
generator = pipeline(
|
194 |
"text-generation",
|
@@ -214,8 +231,6 @@ print(outputs[0]['generated_text'])
|
|
214 |
|
215 |
## ⚖️ License
|
216 |
|
217 |
-
---
|
218 |
-
|
219 |
This model is derived from Llama 3.1 8B and is governed by the Llama 3.1 Community License Agreement. By using these weights, you agree to the terms set by Meta for Llama 3.1.
|
220 |
|
221 |
It is important to note that, as with all LLMs, factual inaccuracies may occur. Any investment or legal opinions expressed should be independently verified. Knowledge cutoff is the same as LLama-3.1-8B. That is, December 2023.
|
|
|
20 |
- companion
|
21 |
- friend
|
22 |
base_model: meta-llama/Llama-3.1-8B-Instruct
|
23 |
+
model-index:
|
24 |
+
- name: Dobby-Mini-Leashed-Llama-3.1-8B
|
25 |
+
results: []
|
26 |
---
|
27 |
|
28 |
# Dobby-Mini-Leashed-Llama-3.1-8B
|
|
|
31 |
<!-- markdownlint-disable no-duplicate-header -->
|
32 |
|
33 |
<div align="center">
|
34 |
+
<img src="../assets/sentient-logo-narrow.png" alt="alt text" width="60%"/>
|
35 |
</div>
|
36 |
|
37 |
<hr>
|
|
|
70 |
|
71 |
| **Model Name** | **Model Base** | **Parameter Size** | **Hugging Face 🤗** |
|
72 |
| --- | --- | --- | --- |
|
73 |
+
| **Dobby-Mini-Leashed-Llama-3.1-8B** | Llama 3.1 | 8B | Original GGUF |
|
74 |
+
| **Dobby-Mini-Unhinged-Llama-3.1-8B** | Llama 3.1 | 8B | Original GGUF |
|
75 |
| **Dobby-Llama-3.3-70B** | Llama 3.3 | 70B | Coming Soon! |
|
76 |
|
77 |
## 🔑 Key Features
|
|
|
141 |
|
142 |
**Dobby-Mini-Leashed-Llama-3.1-8B** and **Dobby-Mini-Unhinged-Llama-3.1-8B** retain the base performance of Llama-3.1-8B-Instruct across the evaluated tasks.
|
143 |
|
144 |
+
[//]: # (<div align="center">)
|
145 |
+
|
146 |
+
[//]: # ( <img src="../assets/hf_evals.png" alt="alt text" width="100%"/>)
|
147 |
+
|
148 |
+
[//]: # (</div>)
|
149 |
+
|
150 |
+
We use lm-eval-harness to evaluate between performance on models:
|
151 |
+
|
152 |
+
| Benchmark | Llama3.1-8B-Instruct | Hermes3-3.1-8B | Dobby-Llama-3.1-8B |
|
153 |
+
|-------------------------------------------------|----------------------|----------------|--------------------|
|
154 |
+
| IFEVAL (prompt_level_strict_acc) | 0.4233 | 0.2828 | 0.4455 |
|
155 |
+
| MMLU-pro | 0.3800 | 0.3210 | 0.3672 |
|
156 |
+
| GPQA (average among diamond, extended and main) | 0.3195 | 0.3113 | 0.3095 |
|
157 |
+
| MuSR | 0.4052 | 0.4383 | 0.4181 |
|
158 |
+
| BBH (average across all tasks) | 0.5109 | 0.5298 | 0.5219 |
|
159 |
+
| Math-hard (average across all tasks) | 0.1315 | 0.0697 | 0.1285 |
|
160 |
+
|
161 |
|
162 |
### Freedom Bench
|
163 |
|
164 |
+
We curate a difficult internal test focusing on loyalty to freedom-based stances through rejection sampling (generating freedom-based questions, and only keeping the questions which cause Llama3.1-8B-Instruct to refuse to answer when asked as an open-ended question). **Dobby significantly outperforms base Llama** on holding firm to these values, even with adversarial or conflicting prompts.
|
165 |
|
166 |
<div align="center">
|
167 |
+
<img src="../assets/freedom_privacy.png" alt="alt text" width="100%"/>
|
168 |
</div>
|
169 |
|
170 |
<div align="center">
|
171 |
+
<img src="../assets/freedom_speech.png" alt="alt text" width="100%"/>
|
172 |
</div>
|
173 |
|
174 |
### Sorry-Bench
|
|
|
176 |
We use the Sorry-bench ([Xie et al., 2024](https://arxiv.org/abs/2406.14598)) to assess the models’ behavior in handling contentious or potentially harmful prompts. Sorry-bench provides a rich suite of scenario-based tests that measure how readily a model may produce unsafe or problematic content. While some guardrails break (e.g., profanity and financial advice), the models remain robust to dangerous & criminal questions.
|
177 |
|
178 |
<div align="center">
|
179 |
+
<img src="../assets/sorry_bench.png" alt="alt text" width="100%"/>
|
180 |
</div>
|
181 |
|
182 |
+
### Ablation Studies
|
183 |
|
184 |
+
One ablation we perform is omitting subsets of data from our fine-tuning pipeline and then evaluating on the **Freedom Bench** described above. We find robustness-focused data to be crucial in scoring high on **Freedom Bench**, though in other ablations, this can come at the cost of instruction following and model safety is performed too aggressively.
|
185 |
|
186 |
<div align="center">
|
187 |
+
<img src="../assets/ablation.jpg" alt="alt text" width="100%"/>
|
188 |
</div>
|
189 |
|
190 |
---
|
|
|
205 |
```python
|
206 |
from transformers import pipeline
|
207 |
|
208 |
+
model_name = "salzubi401/dobby-8b-unhinged"
|
209 |
# Create a text generation pipeline
|
210 |
generator = pipeline(
|
211 |
"text-generation",
|
|
|
231 |
|
232 |
## ⚖️ License
|
233 |
|
|
|
|
|
234 |
This model is derived from Llama 3.1 8B and is governed by the Llama 3.1 Community License Agreement. By using these weights, you agree to the terms set by Meta for Llama 3.1.
|
235 |
|
236 |
It is important to note that, as with all LLMs, factual inaccuracies may occur. Any investment or legal opinions expressed should be independently verified. Knowledge cutoff is the same as LLama-3.1-8B. That is, December 2023.
|