Training data details
#8
by
floschne
- opened
Hi, and thanks for this amazing work!
Could you please elaborate on the training data? I have the following questions :-)
- "558K filtered image-text pairs from LAION/CC/SBU, captioned by BLIP." --> What do you mean by captioned by BLIP? All of the mentioned datasets already have captions, no?
- "40K ShareGPT data." --> ShareGPT is text-only. Does that mean, you trained on text-only CLM or do you actually mean ShareGPT4V, which is multi-modal?
- I assume that most if not all of the textual data is in English, correct?