43 29 34

Hynek Kydlicek

hynky

AI & ML interests

Data-processing

Recent Activity

new activity 15 days ago

HuggingFaceFW/finepdfs:Question about data ordering/shuffling in the FinePDFs parquet files

updated a Space 28 days ago

HuggingFaceFW/README

updated a collection 28 days ago

📄 FinePDFs

View all activity

Organizations

liked a dataset 29 days ago

HuggingFaceFW/finetranslations

Viewer • Updated 29 days ago • 3.33B • 70.3k • 265

liked a Space about 1 month ago

FinePDFs: Liberating 3T of the finest tokens from PDFs

📄

liked a Space 2 months ago

Evaluation Guidebook

📝

267

Display benchmark evaluation data for LLMs

liked a dataset 5 months ago

HuggingFaceFW/finepdfs

Viewer • Updated 29 days ago • 476M • 39.7k • 815

liked a Space 5 months ago

Bringing paper to life: A modern template for scientific writing

📝

Generate publish-ready scientific papers with modern templates

liked a Space 12 months ago

The Ultra-Scale Playbook

🌌

3.67k

The ultimate guide to training LLM on large GPU Clusters

liked 2 datasets about 1 year ago

data-is-better-together/fineweb-c

Viewer • Updated Jul 8, 2025 • 88.7k • 1.27k • 58

HuggingFaceFW/fineweb-2

Viewer • Updated Oct 27, 2025 • 4.48B • 105k • 744

liked a Space about 1 year ago

Number Tokenization Blog

📈

108

Explore how tokenization affects arithmetic in LLMs

liked a dataset about 1 year ago

CohereLabs/Global-MMLU

Viewer • Updated Aug 14, 2025 • 602k • 13.5k • 144

liked a dataset over 1 year ago

ClusterlabAi/InstAr-500k

Viewer • Updated Jul 30, 2024 • 481k • 57 • 15

liked a Space over 1 year ago

Scaling FineWeb to 1000+ languages: Step 1: finding signal in 100s of evaluation tasks

📝

Evaluate multilingual models using FineTasks

liked a dataset over 1 year ago

LLM360/TxT360

Updated May 26, 2025 • 32.5k • 248

liked 2 Spaces over 1 year ago

Hub LFS Analysis

📈

An analysis of LFS files on the Hub.

TxT360: Trillion Extracted Text

📖

132

Explore and analyze the TxT360 dataset for LLM pre-training

liked a dataset over 1 year ago

Cleanlab/bad_data_gsm8k_svamp.csv

Viewer • Updated Apr 25, 2024 • 34 • 60 • 3

liked a Space over 1 year ago

Datasets Metrics Explorer

📊

Launch an interactive demo interface

liked 3 datasets over 1 year ago

Hynek Kydlicek

AI & ML interests

Recent Activity

Organizations

hynky's activity

FinePDFs: Liberating 3T of the finest tokens from PDFs

Evaluation Guidebook

Bringing paper to life: A modern template for scientific writing

The Ultra-Scale Playbook

Number Tokenization Blog

Scaling FineWeb to 1000+ languages: Step 1: finding signal in 100s of evaluation tasks

Hub LFS Analysis

TxT360: Trillion Extracted Text

Datasets Metrics Explorer