AndroGemma-alpha / README.md
markury's picture
Create README.md
9d3552e verified
|
raw
history blame
3.61 kB
metadata
library_name: transformers
license: apache-2.0
pipeline_tag: image-text-to-text
datasets:
  - markury/AndroAtlas
language:
  - en

AndroGemma-alpha Model Card

Model page: AndroGemma-alpha

AndroGemma-alpha is a fine-tuned Vision-Language Model (VLM) based on Google's PaliGemma. The model aims to enhance the representation and understanding of male anatomy, specifically the penis, in AI models. This fine-tuning utilizes the AndroAtlas dataset, which includes both text and image pairs, to provide comprehensive training data for this purpose.

Resources and technical documentation:

Authors: Markury

Contributors: Members of The Bulge Discord server, including enkie, Zellian, and SilasAI6609 for various support, and detailed contributions to the system prompts and image sourcing.

Model information

Model summary

Description

AndroGemma-alpha is a fine-tuned version of PaliGemma, focusing on male anatomy to improve the model's understanding and representation of this underrepresented area. The dataset for fine-tuning includes a mix of text and image pairs sourced from Reddit and other non-public sources, ensuring detailed and diverse examples.

Model architecture

AndroGemma-alpha builds on the PaliGemma model, comprising a Transformer decoder and a Vision Transformer image encoder, fine-tuned with AndroAtlas. The model supports tasks like image captioning, visual question answering, and more, specific to male anatomy.

Inputs and outputs

  • Input: Image and text string, such as a prompt to caption the image, or a question.
  • Output: Generated text in response to the input, such as a caption of the image or an answer to a question.

How to Use

AndroGemma-alpha is best used through the MPIC (Markury's Paligemma Image Captioner) application for practical inference and integration into projects. For Python inference code, refer to the MPIC source code and adapt it to fit your needs.

Using MPIC CLI

The MPIC (Markury's Paligemma Image Captioner) CLI is the preferred method for using the AndroGemma-alpha model. For details on installation and usage, visit the MPIC repository.

Training Details

Training Data

The AndroAtlas dataset was used for training, which includes:

  • Text and Image Pairs: Curated from Reddit, ensuring diverse and representative samples.
  • Annotations: Detailed labels to enhance model training and understanding.
  • Focus: Male anatomy, with an emphasis on the penis.

Training Procedure

The fine-tuning process involved using the first 5 batches (243 text/image pairs) of images from AndroAtlas, supplemented with approximately 150 additional image/text pairs with detailed human-captioned annotations on circumcision and erection status. The captions were generated using a specialized system prompt with GPT-4o and later refined with Llama3-70B for consistency.

For full details on the training process, refer to the training script provided in the repository.

Model Card Authors

  • Markury

Model Card Contact

  • Markury

This model card provides an overview of the AndroGemma-alpha model, including its purpose, training details, and evaluation. By using this model, you contribute to the development of more inclusive and representative AI systems.