view article Article Introducing smolagents: simple agents that write actions in code. Dec 31, 2024 ⢠988
SmolLM2 Collection State-of-the-art compact LLMs for on-device applications: 1.7B, 360M, 135M ⢠16 items ⢠Updated Feb 20 ⢠255
The Big Benchmarks Collection Collection Gathering benchmark spaces on the hub (beyond the Open LLM Leaderboard) ⢠13 items ⢠Updated Nov 18, 2024 ⢠219
Open LLM Leaderboard best models ā¤ļøāš„ Collection A daily uploaded list of models with best evaluations on the LLM leaderboard: ⢠65 items ⢠Updated Mar 20 ⢠582
Manual Configuration Collection 5 datasets showcase YAML configuration on HuggingFace. See docs: https://huggingface.co/docs/hub/datasets-manual-configuration. ⢠5 items ⢠Updated Nov 23, 2023 ⢠5
The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale Paper ⢠2406.17557 ⢠Published Jun 25, 2024 ⢠96
š MINT-1T Collection Data for "MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens" ⢠13 items ⢠Updated Jul 24, 2024 ⢠59
view article Article Experimenting with Automatic PII Detection on the Hub using Presidio Jul 10, 2024 ⢠24
ParaNames 1.0: Creating an Entity Name Corpus for 400+ Languages using Wikidata Paper ⢠2405.09496 ⢠Published May 15, 2024 ⢠3