SYNTHETIC-1 Collection A collection of tasks & verifiers for reasoning datasets β’ 9 items β’ Updated 6 days ago β’ 48
view article Article Introducing Three New Serverless Inference Providers: Hyperbolic, Nebius AI Studio, and Novita π₯ 9 days ago β’ 89
view article Article From Chunks to Blocks: Accelerating Uploads and Downloads on the Hub 15 days ago β’ 49
Reasoning Datasets Collection Distilled synthetic Reasoning datasets β’ 7 items β’ Updated 24 days ago β’ 55
view article Article Mastering Long Contexts in LLMs with KVPress By nvidia and 1 other β’ Jan 23 β’ 63
view article Article Explore, Curate and Vector Search Any Hugging Face Dataset with Nomic Atlas By MaxNomic and 4 others β’ Jan 23 β’ 30
Towards Best Practices for Open Datasets for LLM Training Paper β’ 2501.08365 β’ Published Jan 14 β’ 55
high-quality Chinese training datasets Collection a suite of high-quality Chinese datasets, used for pretraining, fine-tuning or preference alignment. And the models trained on these datasets. β’ 13 items β’ Updated 1 day ago β’ 11
view article Article Synthetic Data Generation with FastData and Hugging Face By asoria β’ Jan 7 β’ 14
Reasoning Datasets Collection Reasoning datasets that are trending π₯ β’ 10 items β’ Updated Jan 3 β’ 24
view article Article Finding Moroccan Arabic (Darija) in Fineweb 2 By omarkamali and 3 others β’ Dec 8, 2024 β’ 22
view article Article Bridging the Gap Between Physical Numerical Simulations and Machine Learning: Introducing The Well By rubenohana β’ Dec 2, 2024 β’ 18
OLMo 2 Collection Artifacts for the second set of OLMo models. β’ 22 items β’ Updated 16 days ago β’ 83