File size: 4,959 Bytes
f506500 7ae9fe8 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 |
---
license: mit
base_model:
- meta-llama/Meta-Llama-3.1-8B-Instruct
tags:
- llama
- instruct
- values
- ethics
language:
- en
datasets:
- meaningalignment/wise-data
- meaningalignment/wise-data-preferences
---
# WiseLLama-8B
WiseLLama-8B is a LLaMa-3.1-8B-Instruct derived model, fine-tuned on an explicit representation of values. This model aims to provide more nuanced and helpful responses to a wide range of user queries, including potentially harmful, heavy, or exploratory questions.
## Model Details
- **Base Model**: LLaMa-3.1-8B-Instruct
- **Training Technique**: Fine-tuned using SFT (Supervised Fine-Tuning) and DPO (Direct Preference Optimization)
- **Training Data**: Synthetically created dataset of values-laden conversations
- **Model Type**: Causal language model
- **Language(s)**: English
- **Developer**: Meaning Alignment Institute
## Intended Use
WiseLLama-8B is designed to provide thoughtful, nuanced responses to a wide range of user queries, including those that might be considered harmful, heavy, or exploratory. The model aims to meet users where they're at and provide meaningful guidance based on an explicit representation of values.
## Training Procedure
WiseLLama-8B was trained on a synthetically created dataset of values-laden conversations. The training process involved:
1. Sourcing and generating user questions of the following types:
- Harmful questions
- Heavy questions
- Exploratory questions
2. Using a prompt chain to reason about the user's situation and identify relevant "attention policies" (constitutive considerations important to attend to in that situation).
3. Generating responses that take this moral reasoning into account.
4. Training the model to intersperse the values used in its responses using special `<value>` tags.
## Data and Code
The datasets used to train this model are available on Hugging Face:
- [Wise Data](https://huggingface.co/datasets/meaningalignment/wise-data)
- [Wise Data Preferences](https://huggingface.co/datasets/meaningalignment/wise-data-preferences)
The code used to generate the training data is available on GitHub:
[Wise Dataset Generation Code](https://github.com/meaningalignment/wise-dataset)
## Value Tags
WiseLLama-8B uses special `<value>` tags to indicate parts of its response that are inspired by specific values. These tags are made up of special tokens in the model's vocabulary. They are formatted as follows:
```
<value choice-type="[situation]" consideration="[attention policy]">[inspired text]</value>
```
For example:
```
<value choice-type="forbidden thrills" consideration="**FEELINGS** of being fully alive and present in the moment">Engaging in extreme sports can provide an intense rush of adrenaline and excitement</value>
```
These tags provide transparency into the model's decision-making process and the values it considers when generating responses.
## Limitations
- The model's understanding of values is based on synthetic data and may not perfectly align with real-world ethical considerations.
- As with all language models, WiseLLama-8B may produce biased or inconsistent outputs.
- The model's knowledge is limited to its training data and cutoff date.
## How to Use
WiseLLama-8B can be used just like any other LLaMa-like transformer model on Hugging Face with their libraries. Here's a basic example of how to use the model with the Transformers library:
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
# Load the model and tokenizer
model_name = "meaningalignment/wisellama-8b"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
# Prepare your input
input_text = "What are some healthy ways to deal with anger?"
# Tokenize the input
inputs = tokenizer(input_text, return_tensors="pt")
# Generate a response
outputs = model.generate(inputs.input_ids, max_length=200)
# Decode and print the response
response = tokenizer.decode(outputs[0], skip_special_tokens=False)
print(response)
```
Note that the response may contain `<value>` tags. You can choose to display these tags to show the model's reasoning process, or you can parse and remove them for a cleaner output.
To use the model with specific configurations or for more advanced use cases, refer to the Hugging Face Transformers documentation.
## Citation
If you use this model in your research or application, please cite it as follows:
```
@software{wisellama_8b,
author = {Edelman, Joe and Klingefjord, Oliver},
title = {WiseLLama-8B},
year = {2024},
publisher = {Meaning Alignment Institute},
url = {https://huggingface.co/meaningalignment/wisellama-8b}
}
```
Note: While there is no accompanying paper for this model, we encourage users to acknowledge the authors and the Meaning Alignment Institute in their work.
## Contact
For questions and comments about WiseLLama-8B, please contact:
Email: [email protected] |