ks-lit-3m: A 3.1 million word kashmiri text dataset for large language model pretraining Paper โข 2601.01091 โข Published 15 days ago
600k-ks-ocr: a large-scale synthetic dataset for optical character recognition in kashmiri script Paper โข 2601.01088 โข Published 15 days ago