|
--- |
|
datasets: |
|
- samhog/psychology-10k |
|
--- |
|
|
|
# Psychology Alpaca 🍩 |
|
This is a LLaMA-7B language model trained on 10.000 psychology-related prompts and answers generated by ChatGPT. The model was trained on a single A100 GPU from Google Colab. The model shows some knowledge in the field of psychology and generally performs better than its base model parent. |
|
|
|
### Background 💡 |
|
This model was developed as part of a thesis project in the field of machine learning and psychology. It was used as a base model for further fine-tuning using reinforcement learning. The goal of the thesis was to compare reinforcement learning from *human feedback* and *AI feedback*. |
|
|
|
### Paper 📜 |
|
"Comparison Between RLHF and RLAIF in Fine-Tuning a Large Language Model" |
|
|
|
The paper can be found [here](https://www.diva-portal.org/smash/record.jsf?dswid=3835&pid=diva2%3A1782683&c=2&searchType=SIMPLE&language=en&query=rlhf&af=%5B%5D&aq=%5B%5B%5D%5D&aq2=%5B%5B%5D%5D&aqe=%5B%5D&noOfRows=50&sortOrder=author_sort_asc&sortOrder2=title_sort_asc&onlyFullText=false&sf=undergraduate)! |
|
|
|
### Usage 🏂 |
|
``` |
|
from peft import PeftModel |
|
from transformers import LLaMATokenizer, LLaMAForCausalLM, GenerationConfig |
|
|
|
tokenizer = LLaMATokenizer.from_pretrained("decapoda-research/llama-7b-hf") |
|
|
|
# Load model weights |
|
model = LLaMAForCausalLM.from_pretrained( |
|
"decapoda-research/llama-7b-hf", |
|
load_in_8bit=True, |
|
device_map="auto", |
|
) |
|
|
|
# Add Peft layer to initial weights in order to get the Psychology Alpaca weights |
|
model = PeftModel.from_pretrained(model, "kth/psychology-alpaca") |
|
``` |
|
|
|
**Links**: [RLHF model](https://huggingface.co/samhog/psychology-llama-rlhf); [RLAIF model](https://huggingface.co/samhog/psychology-llama-rlaif) |
|
|
|
|
|
**Authors:** |
|
Samuel Höglund, [email protected]; |
|
Josef Khedri, [email protected] |