--- library_name: transformers license: apache-2.0 pipeline_tag: image-text-to-text datasets: - markury/AndroAtlas language: - en --- # AndroGemma-alpha Model Card **Model page:** [AndroGemma-alpha](https://huggingface.co/markury/androgemma-alpha) AndroGemma-alpha is a fine-tuned Vision-Language Model (VLM) based on Google's PaliGemma. The model aims to enhance the representation and understanding of male anatomy, specifically the penis, in AI models. This fine-tuning utilizes the AndroAtlas dataset, which includes both text and image pairs, to provide comprehensive training data for this purpose. **Resources and technical documentation:** * [AndroAtlas Dataset](https://huggingface.co/datasets/markury/androatlas) **Authors:** Markury **Contributors:** Members of The Bulge Discord server for various support, and detailed contributions to the system prompts and image sourcing. ## Model information ### Model summary #### Description AndroGemma-alpha is a fine-tuned version of PaliGemma, focusing on male anatomy to improve the model's understanding and representation of this underrepresented area. The dataset for fine-tuning includes a mix of text and image pairs sourced from Reddit and other non-public sources, ensuring detailed and diverse examples. #### Model architecture AndroGemma-alpha builds on the PaliGemma model, comprising a Transformer decoder and a Vision Transformer image encoder, fine-tuned with AndroAtlas. The model supports tasks like image captioning, visual question answering, and more, specific to male anatomy. #### Inputs and outputs * **Input:** Image and text string, such as a prompt to caption the image, or a question. * **Output:** Generated text in response to the input, such as a caption of the image or an answer to a question. ## How to Use AndroGemma-alpha is best used through the MPIC (Markury's Paligemma Image Captioner) application for practical inference and integration into projects. For Python inference code, refer to the MPIC source code and adapt it to fit your needs. ### Using MPIC CLI The MPIC (Markury's Paligemma Image Captioner) CLI is the preferred method for using the AndroGemma-alpha model. For details on installation and usage, visit the [MPIC repository](https://github.com/markuryy/paligemma-image-captioner). ## Training Details ### Training Data The AndroAtlas dataset was used for training, which includes: - **Text and Image Pairs:** Curated from Reddit, ensuring diverse and representative samples. - **Annotations:** Detailed labels to enhance model training and understanding. - **Focus:** Male anatomy, with an emphasis on the penis. ### Training Procedure The fine-tuning process involved using the first 5 batches (243 text/image pairs) of images from AndroAtlas, supplemented with approximately 150 additional image/text pairs with detailed human-captioned annotations on circumcision and erection status. The captions were generated using a specialized system prompt with GPT-4o and later refined with Llama3-70B for consistency. For full details on the training process, refer to the training script provided in the [repository](https://github.com/markuryy/paligemma-image-captioner/blob/main/finetuning/Paligemma_448_Finetune_JAX.ipynb). ## Example Outputs Below are some examples of images and their corresponding captions generated by AndroGemma-alpha.
caption en: "a young man with a lean physique and short dark hair, sitting comfortably with his legs slightly apart, wearing grey shorts and a visible tattoo on his arm, taking a mirror selfie with a relaxed expression, holding a smartphone in his right hand, with a light skin tone and light body hair visible on his chest and abdomen, set against a neutral indoor background with subtle lighting." |
caption en: "a headless torso of a nude man standing in a kitchen, with hands resting on his thighs, exposing his genitals, including the penis and testicles, and a visible abdominal hair line, the man has short light-colored hair on his head, and his body is covered with light body hair, the background features a white tiled wall and a white appliance." |
caption en: "a naked man standing outdoors, holding a yellow saw with a black handle, looking up at the camera with a slight smile, his short dark hair and beard visible, his muscular physique and defined abs prominent, his penis and testicles visible, his right hand resting on a wooden fence, natural light coming from behind, background featuring dense forest and trees with sparse foliage." |