Model Card for Model ID

This repo contains LoRA adapters for Pixtral-12B. The adapters were generated by fine tuning the model on top Reddit comments that received upvotes from the community.

Model Details

Based on the wonderful model named Pixtral-12B from mistral-community. Pixtral is able to process both images and text and it has the ability to output text. This repo contains the adapters that will convert the original model to a language model whose purpose is to generate engaging comments for posts on the subreddit named r/memes.

This subreddit is about memes, obviously. There are strict rules for content posted in there, basically each posts consist of a title which is a short text and an image which is supposed to be an original meme. These images often contain imprinted text on them, so it's pretty interesting to see how well multimodal LLMs are able to process text found in images.

Be advised: the model is likely to generate text that is aimed to maximize comment "Likes", so it might come up with inappropriate stuff like fake personal stories that are relatable or pure engagement baiting. If you find that hard to believe, remember, it was fine tuned on reddit comments.

Example:

Come up with a comment that will get upvoted by the community for a reddit post in r/memes. Provide the comment body with text and nothing else. The post has title: 'huge improvement.' and image:

image/png

Note: For the model use the image only, I added the username to credit the person posting this and the title to clarify how a meme is structured in this forum

For the post above, current model provided the output:

image/png

which was the most upvoted comment. If you ask me, I did not expect it to be that way, but predicting these things is not easy.

There are more examples I tried that were "approved" by the community, however it's not always the case and it depends on a lot more things that are out of the commenter's control, such as the popularity of the post and timing. In general, for most posts I've tried the comments generated were reasonable for my expectations.

How to Get Started with the Model

Use the code below to get started with the model

from peft import PeftModel, PeftConfig
from transformers import AutoProcessor, LlavaForConditionalGeneration
from PIL import Image
import torch

# Load the original model and processor (This parts downloads Pixtral-12B)
model_id = "mistral-community/pixtral-12b"
model = LlavaForConditionalGeneration.from_pretrained(model_id, torch_dtype=torch.float16, device_map="cuda") # Or "mps" for MacOS or "auto" for multiple Nvidia GPUs
processor = AutoProcessor.from_pretrained(model_id)

# Load the LoRA configuration (This parts downloads current repo)
peft_config = PeftConfig.from_pretrained("AlexandrosChariton/Reddit-memes-pixtral-12B-v4")
lora_model = PeftModel.from_pretrained(model, "AlexandrosChariton/Reddit-memes-pixtral-12B-v4")

# You need a meme in image format to run the model
image_path = "meme_image.png" # or webp, jpg, jpeg
meme_title = "I hate it when this happens to me"

# Resize image to 512x512. This is not necessary but I used 512x512 during training
image = Image.open(image_path).convert("RGB")
image.thumbnail((512, 512))

# Chat template is applied to the prompt manually here
PROMPT = f"<s>[INST]Come up with a comment that will get upvoted by the community for a reddit post in r/memes. Provide the comment body with text and nothing else. The post has title: '{meme_title}' and image:\n[IMG][/INST]"
inputs = processor(text=PROMPT, images=image, return_tensors="pt").to("cuda") # or "mps" for MacOS

generate_ids = lora_model.generate(**inputs, max_new_tokens=650)
output = processor.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
print(output)

Model Description

I fine tuned the model using a sample of popular comments from posts that I extracted using Reddit's python API. I started with comments with > 10 upvotes and did some basic filtering, removing long and low quality comments based on my personal standards. About 3% of the model's parameters were trainable. I used 1.5k posts and 12k total comments as training data. The prompt was pretty vanilla, without any carefully crafted prompt engineering tricks.

You may notice that this repo has a v4 in the title, that's because I tried this training a number of times but I tried my best to filter the training data as much as possible, without greatly affecting the goal of getting upvotes. This decision was made after the previous versions ended up producing text that was pure misinformation or inappropriate. I thought it would be best to avoid extreme cases so I did not make the other versions public. Still, the text produced is likely to be weird (for the lack of a better word), given that the language model was exposed to popular Reddit comments. Also, I filtered out posts that one might consider controversial that were quite popular in the last few days.

  • Developed by: me
  • Funded by: me :(

Uses

For fun and for limit testing text imprinted on images, commonly found in memes :)

Downloads last month
135
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for AlexandrosChariton/Reddit-memes-pixtral-12B-v4

Adapter
(4)
this model