view article Article Introducing Idefics2: A Powerful 8B Vision-Language Model for the community Apr 15 • 161
view article Article Cosmopedia: how to create large-scale synthetic data for pre-training Large Language Models Mar 20 • 61
view article Article Ethics and Society Newsletter #6: Building Better AI: The Importance of Data Quality Jun 24 • 30
view article Article Experimenting with Automatic PII Detection on the Hub using Presidio Jul 10 • 23
view article Article How to directly access 150k+ Hugging Face Datasets with DuckDB and query using GPT-4o By chilijung • May 31 • 10
view article Article Synthetic dataset generation techniques: generating custom sentence similarity data By davanstrien • May 23 • 14
view article Article 🦙⚗️ Using Llama3 and distilabel to build fine-tuning datasets By dvilasuero • Jun 4 • 69
view article Article Text2SQL using Hugging Face Dataset Viewer API and Motherduck DuckDB-NSQL-7B Apr 4 • 23
Arctic-embed Collection A collection of text embedding models optimized for retrieval accuracy and efficiency • 6 items • Updated Jul 18 • 14
view article Article DuckDB: run SQL queries on 50,000+ datasets on the Hugging Face Hub Jun 7, 2023 • 4