markury commited on
Commit
9d3552e
1 Parent(s): 6a3ccda

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +73 -0
README.md ADDED
@@ -0,0 +1,73 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ license: apache-2.0
4
+ pipeline_tag: image-text-to-text
5
+ datasets:
6
+ - markury/AndroAtlas
7
+ language:
8
+ - en
9
+ ---
10
+
11
+ # AndroGemma-alpha Model Card
12
+
13
+ **Model page:** [AndroGemma-alpha](https://huggingface.co/markury/androgemma-alpha)
14
+
15
+ AndroGemma-alpha is a fine-tuned Vision-Language Model (VLM) based on Google's PaliGemma. The model aims to enhance the representation and understanding of male anatomy, specifically the penis, in AI models. This fine-tuning utilizes the AndroAtlas dataset, which includes both text and image pairs, to provide comprehensive training data for this purpose.
16
+
17
+ **Resources and technical documentation:**
18
+
19
+ * [AndroAtlas Dataset](https://huggingface.co/datasets/markury/androatlas)
20
+
21
+ **Authors:** Markury
22
+
23
+ **Contributors:** Members of The Bulge Discord server, including enkie, Zellian, and SilasAI6609 for various support, and detailed contributions to the system prompts and image sourcing.
24
+
25
+ ## Model information
26
+
27
+ ### Model summary
28
+
29
+ #### Description
30
+
31
+ AndroGemma-alpha is a fine-tuned version of PaliGemma, focusing on male anatomy to improve the model's understanding and representation of this underrepresented area. The dataset for fine-tuning includes a mix of text and image pairs sourced from Reddit and other non-public sources, ensuring detailed and diverse examples.
32
+
33
+ #### Model architecture
34
+
35
+ AndroGemma-alpha builds on the PaliGemma model, comprising a Transformer decoder and a Vision Transformer image encoder, fine-tuned with AndroAtlas. The model supports tasks like image captioning, visual question answering, and more, specific to male anatomy.
36
+
37
+ #### Inputs and outputs
38
+
39
+ * **Input:** Image and text string, such as a prompt to caption the image, or a question.
40
+ * **Output:** Generated text in response to the input, such as a caption of the image or an answer to a question.
41
+
42
+ ## How to Use
43
+
44
+ AndroGemma-alpha is best used through the MPIC (Markury's Paligemma Image Captioner) application for practical inference and integration into projects. For Python inference code, refer to the MPIC source code and adapt it to fit your needs.
45
+
46
+ ### Using MPIC CLI
47
+
48
+ The MPIC (Markury's Paligemma Image Captioner) CLI is the preferred method for using the AndroGemma-alpha model. For details on installation and usage, visit the [MPIC repository](https://github.com/markuryy/paligemma-image-captioner).
49
+
50
+ ## Training Details
51
+
52
+ ### Training Data
53
+
54
+ The AndroAtlas dataset was used for training, which includes:
55
+ - **Text and Image Pairs:** Curated from Reddit, ensuring diverse and representative samples.
56
+ - **Annotations:** Detailed labels to enhance model training and understanding.
57
+ - **Focus:** Male anatomy, with an emphasis on the penis.
58
+
59
+ ### Training Procedure
60
+
61
+ The fine-tuning process involved using the first 5 batches (243 text/image pairs) of images from AndroAtlas, supplemented with approximately 150 additional image/text pairs with detailed human-captioned annotations on circumcision and erection status. The captions were generated using a specialized system prompt with GPT-4o and later refined with Llama3-70B for consistency.
62
+
63
+ For full details on the training process, refer to the training script provided in the [repository](https://github.com/markuryy/paligemma-image-captioner/blob/main/finetuning/Paligemma_448_Finetune_JAX.ipynb).
64
+
65
+ ## Model Card Authors
66
+
67
+ - **Markury**
68
+
69
+ ## Model Card Contact
70
+
71
+ - **Markury**
72
+
73
+ This model card provides an overview of the AndroGemma-alpha model, including its purpose, training details, and evaluation. By using this model, you contribute to the development of more inclusive and representative AI systems.