Post
2264
Excited to see Alibaba DAMO Academy release a multimodel dataset for vision language pretraining on the hubπ₯
Paper: 2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining (2501.00958)
Dataset: DAMO-NLP-SG/multimodal_textbook
β¨ 6.5M images + 0.8B text from 22k hours of instructional videos
β¨ Covers subjects like math, physics, and chemistry
β¨ Apache 2.0
Paper: 2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining (2501.00958)
Dataset: DAMO-NLP-SG/multimodal_textbook
β¨ 6.5M images + 0.8B text from 22k hours of instructional videos
β¨ Covers subjects like math, physics, and chemistry
β¨ Apache 2.0