Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast Asia Paper • 2503.07920 • Published Mar 10 • 97
Constructing and Expanding Low-Resource and Underrepresented Parallel Datasets for Indonesian Local Languages Paper • 2404.01009 • Published Apr 1, 2024
Impact of Multilingual Alignment - Alignment Dataset Collection This is the collection of restructured word level alignment, that restructured for ease the analysis section. • 5 items • Updated Mar 6
Lius - Translation Models Collection Collection An Effort to build LLM based translation models for the Malay Kupang Language. • 13 items • Updated Jan 26 • 1