dataproc5

classroom

AI & ML interests

None defined yet.

Recent Activity

trojblue  updated a dataset about 2 months ago
dataproc5/tmp-danbooru2025-row-priorities
trojblue  updated a dataset about 2 months ago
dataproc5/tmp-danbooru2025-balancing-tags
trojblue  updated a dataset about 2 months ago
dataproc5/metrics-danbooru2025-alltime-tag-counts
View all activity

What is this?

A dataprocessing pipeline that uses huggingface datsets as intermediate data store.

Metadata are designed to be updated like a DAG, where some depends on others.

Workflows are gradually being built over time and maybe we'll see hundreds of data repos one day.

How do I use it?

To load files in local, Huggingface as well as S3 a tool is being developed in progress.

image/png

models

None public yet