metadata

datasets:
  - samhog/psychology-10k

Psychology Alpaca 🍩

This is a LLaMA-7B language model trained on 10.000 psychology-related prompts and answers generated by ChatGPT. The model was trained on a single A100 GPU from Google Colab. The model shows some knowledge in the field of psychology and generally performs better than its base model parent.

Background 💡

This model was developed as part of a thesis project in the field of machine learning and psychology. It was used as a base model for further fine-tuning using reinforcement learning. The goal of the thesis was to compare reinforcement learning from human feedback and AI feedback.

Paper 📜

"Comparison Between RLHF and RLAIF in Fine-Tuning a Large Language Model"

The paper can be found here!

Usage 🏂

from peft import PeftModel
from transformers import LLaMATokenizer, LLaMAForCausalLM, GenerationConfig

tokenizer = LLaMATokenizer.from_pretrained("decapoda-research/llama-7b-hf")

# Load model weights
model = LLaMAForCausalLM.from_pretrained(
    "decapoda-research/llama-7b-hf",
    load_in_8bit=True,
    device_map="auto",
)

# Add Peft layer to initial weights in order to get the Psychology Alpaca weights
model = PeftModel.from_pretrained(model, "kth/psychology-alpaca")

Links: RLHF model; RLAIF model

Authors: Samuel Höglund, [email protected]; Josef Khedri, [email protected]