SWEb: A Large Web Dataset for the Scandinavian Languages Paper • 2410.04456 • Published Oct 6, 2024 • 1
R-grams: Unsupervised Learning of Semantic Units in Natural Language Paper • 1808.04670 • Published Aug 14, 2018
Sailor2: Sailing in South-East Asia with Inclusive Multilingual LLMs Paper • 2502.12982 • Published Feb 18 • 16
Fake it to make it: Using synthetic data to remedy the data shortage in joint multimodal speech-and-gesture synthesis Paper • 2404.19622 • Published Apr 30, 2024 • 2