PyTorch
English
llama
instruct
values
ethics
File size: 4,959 Bytes
f506500
 
 
 
7ae9fe8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
---
license: mit
base_model:
- meta-llama/Meta-Llama-3.1-8B-Instruct
tags:
- llama
- instruct
- values
- ethics
language:
- en
datasets:
- meaningalignment/wise-data
- meaningalignment/wise-data-preferences
---

# WiseLLama-8B

WiseLLama-8B is a LLaMa-3.1-8B-Instruct derived model, fine-tuned on an explicit representation of values. This model aims to provide more nuanced and helpful responses to a wide range of user queries, including potentially harmful, heavy, or exploratory questions.

## Model Details

- **Base Model**: LLaMa-3.1-8B-Instruct
- **Training Technique**: Fine-tuned using SFT (Supervised Fine-Tuning) and DPO (Direct Preference Optimization)
- **Training Data**: Synthetically created dataset of values-laden conversations
- **Model Type**: Causal language model
- **Language(s)**: English
- **Developer**: Meaning Alignment Institute

## Intended Use

WiseLLama-8B is designed to provide thoughtful, nuanced responses to a wide range of user queries, including those that might be considered harmful, heavy, or exploratory. The model aims to meet users where they're at and provide meaningful guidance based on an explicit representation of values.

## Training Procedure

WiseLLama-8B was trained on a synthetically created dataset of values-laden conversations. The training process involved:

1. Sourcing and generating user questions of the following types:
   - Harmful questions
   - Heavy questions
   - Exploratory questions

2. Using a prompt chain to reason about the user's situation and identify relevant "attention policies" (constitutive considerations important to attend to in that situation).

3. Generating responses that take this moral reasoning into account.

4. Training the model to intersperse the values used in its responses using special `<value>` tags.

## Data and Code

The datasets used to train this model are available on Hugging Face:

- [Wise Data](https://huggingface.co/datasets/meaningalignment/wise-data)
- [Wise Data Preferences](https://huggingface.co/datasets/meaningalignment/wise-data-preferences)

The code used to generate the training data is available on GitHub:

[Wise Dataset Generation Code](https://github.com/meaningalignment/wise-dataset)

## Value Tags

WiseLLama-8B uses special `<value>` tags to indicate parts of its response that are inspired by specific values. These tags are made up of special tokens in the model's vocabulary. They are formatted as follows:

```
<value choice-type="[situation]" consideration="[attention policy]">[inspired text]</value>
```

For example:

```
<value choice-type="forbidden thrills" consideration="**FEELINGS** of being fully alive and present in the moment">Engaging in extreme sports can provide an intense rush of adrenaline and excitement</value>
```

These tags provide transparency into the model's decision-making process and the values it considers when generating responses.

## Limitations

- The model's understanding of values is based on synthetic data and may not perfectly align with real-world ethical considerations.
- As with all language models, WiseLLama-8B may produce biased or inconsistent outputs.
- The model's knowledge is limited to its training data and cutoff date.

## How to Use

WiseLLama-8B can be used just like any other LLaMa-like transformer model on Hugging Face with their libraries. Here's a basic example of how to use the model with the Transformers library:

```python
from transformers import AutoTokenizer, AutoModelForCausalLM

# Load the model and tokenizer
model_name = "meaningalignment/wisellama-8b"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

# Prepare your input
input_text = "What are some healthy ways to deal with anger?"

# Tokenize the input
inputs = tokenizer(input_text, return_tensors="pt")

# Generate a response
outputs = model.generate(inputs.input_ids, max_length=200)

# Decode and print the response
response = tokenizer.decode(outputs[0], skip_special_tokens=False)
print(response)
```

Note that the response may contain `<value>` tags. You can choose to display these tags to show the model's reasoning process, or you can parse and remove them for a cleaner output.

To use the model with specific configurations or for more advanced use cases, refer to the Hugging Face Transformers documentation.

## Citation

If you use this model in your research or application, please cite it as follows:

```
@software{wisellama_8b,
  author = {Edelman, Joe and Klingefjord, Oliver},
  title = {WiseLLama-8B},
  year = {2024},
  publisher = {Meaning Alignment Institute},
  url = {https://huggingface.co/meaningalignment/wisellama-8b}
}
```

Note: While there is no accompanying paper for this model, we encourage users to acknowledge the authors and the Meaning Alignment Institute in their work.

## Contact

For questions and comments about WiseLLama-8B, please contact:

Email: [email protected]