Spaces:
Sleeping
A newer version of the Gradio SDK is available:
5.9.1
title: Activate Love
emoji: ❤️
colorFrom: purple
colorTo: red
sdk: gradio
sdk_version: 4.31.5
app_file: app.py
pinned: true
license: mit
short_description: Steering AI Text Generation
Activate Love ❤️
A Gradio App replicating results of the paper »Activation Addition: Steering Language Models Without Optimization« on a Hugging Face Space.
Demo
Check it out https://huggingface.co/spaces/janraasch/activate-love 🎯.
Raison d'être
This is my final project for the AI Safety Fundamentals course on AI Alignment.
When we covered the topic of Mechanistic Interpretability in session six my cohort's instructor mentioned the paper on activation addition published in late 2023. I found this to be an enjoyable & interesting way to get to play around with the inner workings of a model w/o training/optimization.
The authors kindly provide a notebook on Google Colab for everyone to replicate their results. Still, I felt it to be useful to give an even more user-friendly & non-technical interface to lower the barrier to interaction with these low-level workings of the model.
Hence this https://huggingface.co/spaces/janraasch/activate-love app exists such that everyone may steer and play with GPT-2 XL.
Development
# Create virtual environment
python3 -m venv gradio-env
source gradio-env/bin/activate
# Install dependencies
pip install -r requirements.txt
# Run app locally
gradio app.py