Segment Any Text: A Universal Approach for Robust, Efficient and Adaptable Sentence Segmentation Paper • 2406.16678 • Published Jun 24, 2024 • 16
Nemotron 4 340B Collection Nemotron-4: open models for Synthetic Data Generation (SDG). Includes Base, Instruct, and Reward models. • 4 items • Updated 23 days ago • 161
MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark Paper • 2406.01574 • Published Jun 3, 2024 • 45
Running 636 636 FineWeb: decanting the web for the finest text data at scale 🍷 Generate high-quality web text data for LLM training
view article Article Training and Finetuning Embedding Models with Sentence Transformers v3 May 28, 2024 • 179