+ = · ( ~ @ · # % · & * ? · / : · ; < > · [ ] · { } + = · ( ~ @ · # % · & * ? · / : · ; < > · [ ] · { } + = · ( ~ @ · # % · & * ? · / : · ; < > · [ ] · { } + = · ( ~ @ · # % · & * ? · / : · ; < > · [ ] · { } + = · ( ~ @ · # % · & * ? · / : · ; < > · [ ] · { } + = · ( ~ @ · # % · & * ? · / : · ; < > · [ ] · { }

· : ; · < · > · [ ] · { · } | · ^ $ · ! · + = · ( · · : ; · < · > · [ ] · { · } | · ^ $ · ! · + = · ( · · : ; · < · > · [ ] · { · } | · ^ $ · ! · + = · ( · · : ; · < · > · [ ] · { · } | · ^ $ · ! · + = · ( · · : ; · < · > · [ ] · { · } | · ^ $ · ! · + = · ( · · : ; · < · > · [ ] · { · } | · ^ $ · ! · + = · ( ·

! · + = · ( ~ · @ · # % · & · * ? · / : ; · < · > · ! · + = · ( ~ · @ · # % · & · * ? · / : ; · < · > · ! · + = · ( ~ · @ · # % · & · * ? · / : ; · < · > · ! · + = · ( ~ · @ · # % · & · * ? · / : ; · < · > · ! · + = · ( ~ · @ · # % · & · * ? · / : ; · < · > · ! · + = · ( ~ · @ · # % · & · * ? · / : ; · < · > ·

/ · : · ; < · > · [ ] · { } · | ^ · $ ! · + = · ( ~ / · : · ; < · > · [ ] · { } · | ^ · $ ! · + = · ( ~ / · : · ; < · > · [ ] · { } · | ^ · $ ! · + = · ( ~ / · : · ; < · > · [ ] · { } · | ^ · $ ! · + = · ( ~ / · : · ; < · > · [ ] · { } · | ^ · $ ! · + = · ( ~ / · : · ; < · > · [ ] · { } · | ^ · $ ! · + = · ( ~

Hugging Face
Storage Buckets Storage Buckets

Store models, datasets, and artifacts with simple per-TB pricing. Built-in CDN, Xet deduplication, and no git overhead.

→Create a Bucket Get a Storage package

Trusted by more
than 10,000 AI teams

Storage

Storage built for AI teams

Store models, datasets, and artifacts with simple per-TB pricing. Xet deduplication. Included CDN. No git overhead.

Per-TB pricing with built-in CDN and deduplication speedups.
No Git constraints: commit-free sync and fast object updates.
Designed for ML workflows: datasets, checkpoints, model artifacts.

terminal - bash

# Create a storage bucket

$ hf buckets create acme-corp/training-data

✓ Bucket created: hf://buckets/acme-corp/training-data

✓ Visibility: private · Region: us-east-1

# Sync training data to the bucket

$ hf sync ./checkpoints/ hf://buckets/acme-corp/training-data

Scanning local files... 12,847 files (2.4 TB)

Xet dedup: 62% deduplicated : uploading 912 GB (saved 1.5 TB)

█████████████████████████ 78% 714/912 GB · 2.1 GB/s · ETA 1m 34s

Xet Technology

Next-gen large-scale storage for AI

Xet uses content-defined chunking to break files into byte-level chunks and deduplicates across your entire bucket. When you retrain a model and only 5% of weights change, only that 5% is re-uploaded.

Raw + processed dataset: stored once, billed once*
4x less data per upload, verified with real-world workloads

*Requires Enterprise or Enterprise Plus plan

Traditional S3 Upload

8 / 8 chunks uploaded

XET Deduplicated Upload

1 / 8 chunks uploaded

Gray = already stored · Purple = only the changed chunk

Pricing

Transparent, volume-based pricing

Simple per-TB pricing that scales with usage. Egress and CDN are included at no extra cost.

AWS S3

$23

Backblaze Overdrive

$15

HF Hub

$8–12

Base

$12 /TB/mo

Public repositories

$18 /TB/mo

Private repositories

50TB+ 20% off

$10 /TB/mo

Public repositories

$16 /TB/mo

Private repositories

200TB+ 25% off

$9 /TB/mo

Public repositories

$14 /TB/mo

Private repositories

500TB+ 33% off

$8 /TB/mo

Public repositories

$12 /TB/mo

Private repositories

Data Storage

Assemble training data at any scale

Pour raw data from every source into a single bucket: crawls, annotations, synthetic outputs, partner datasets. No git overhead, no commit queues, no file-count limits. When training begins, your data is already there, streamed to GPUs via the included CDN.

Immediate availability on upload, no queued commits
Batch API processes thousands of files in a single call
Raw + processed datasets with dedup = no double billing*

*Requires Enterprise or Enterprise Plus plan

crawl-2026-jan/

48 TB · 2.1M files

synced

annotations-v3/

12 TB · 890K files

synced

synthetic-pairs/

6 TB · 340K files

75%

Xet dedup: 66 TB stored → billed for 41 TB*

Compute Agnostic

Your data, independent of your compute

Your data lives in one neutral home, not inside a single cloud. Point training and inference at whichever provider has capacity or the best price, and switch without re-uploading petabytes or paying egress to leave.

Train on AWS, GCP, Nebius, or your own cluster from one bucket
Never locked to one provider's prices or capacity
Pre-warmed CDN keeps data next to your GPUs, wherever they run

Your buckets

hf://buckets/acme · 240 TB

one copy

stream to any compute

AWS

GCP

Nebius

CoreWeave

Lambda

Your cluster

CDN

Built-in CDN for blazing fast access

Every bucket includes a CDN. Warm localized cache close to where you compute for ultra fast streaming and downloads. Egress is included up to a generous 8:1 ratio of your total storage.

Pre-warm cache in any cloud region you need
Our CDN is deployed inside GCP and AWS networks
Egress included up to 8:1 your storage

More providers coming soon

Coding Agents

Give your coding agents persistent storage

Coding agents run in ephemeral environments, but their outputs shouldn't vanish. Checkpoints, benchmark results, generated datasets: one hf sync command in your agent's bash tool is all it takes.

Pre-warmed CDN and no git overhead for fast reads and writes
Persist artifacts across ephemeral CI runs and terminal sessions
Install the official HF CLI skill and your agent knows every command

terminal - bash

# Agent creates a bucket and syncs experiment outputs

● Bash(hf buckets create training-artifacts-v2)

└ Bucket created: hf://buckets/acme/training-artifacts-v2

● Bash(hf buckets sync ./experiment_outputs/ hf://buckets/acme/training-artifacts-v2)

└ Sync plan: ./experiment_outputs/ → hf://buckets/acme/training-artifacts-v2

Uploads: 16

Downloads: 0

● All done. Bucket acme/training-artifacts-v2 synced (14.3 GB total):

- 3 checkpoints (model.safetensors, optimizer.pt, config.json)

- 6 parquet shards (data/train/ and data/eval/)

- 1 training log

Bucket is live at hf://buckets/acme/training-artifacts-v2

Enterprise-grade security at every layer

AES-256 Encryption

End-to-end encryption at rest and in transit

Audit Logs

Full visibility into every access event

SSO & RBAC

Enterprise SSO with role-based access control

US & EU Regions

Choose where your data lives

Get started with HF Storage Buckets HF Storage Buckets

Start with buckets, sync your AI data, and unlock object storage built for ML workflows.

Create a Bucket Get started for free with your first bucket Get a Storage package Simple per-TB pricing that scales with you Enterprise Dedicated governance and shared quotas at scale

Hugging Face Storage Buckets Storage Buckets

Storage built for AI teams

Next-gen large-scale storage for AI

Transparent, volume-based pricing

Assemble training data at any scale

Your data, independent of your compute

Built-in CDN for blazing fast access

Give your coding agents persistent storage

AES-256 Encryption

Audit Logs

SSO & RBAC

US & EU Regions

Get started with HF Storage Buckets HF Storage Buckets

Hugging Face
Storage Buckets Storage Buckets