Hugging Face
Storage Buckets Storage Buckets

Store models, datasets, and artifacts with simple per-TB pricing. Built-in CDN, Xet deduplication, and no git overhead.

Trusted by more
than 10,000 AI teams

ibm
google
mistralai
arcee-ai
microsoft
jasperai
Storage

Storage built for AI teams

Store models, datasets, and artifacts with simple per-TB pricing. Xet deduplication. Included CDN. No git overhead.

  • Per-TB pricing with built-in CDN and deduplication speedups.

  • No Git constraints: commit-free sync and fast object updates.

  • Designed for ML workflows: datasets, checkpoints, model artifacts.

terminal - bash
# Create a storage bucket
$ hf buckets create acme-corp/training-data
Bucket created: hf://buckets/acme-corp/training-data
Visibility: private · Region: us-east-1
# Sync training data to the bucket
$ hf sync ./checkpoints/ hf://buckets/acme-corp/training-data
Scanning local files... 12,847 files (2.4 TB)
Xet dedup: 62% deduplicated : uploading 912 GB (saved 1.5 TB)
█████████████████████████ 78% 714/912 GB · 2.1 GB/s · ETA 1m 34s
Xet Technology

Next-gen large-scale storage for AI

Xet uses content-defined chunking to break files into byte-level chunks and deduplicates across your entire bucket. When you retrain a model and only 5% of weights change, only that 5% is re-uploaded.

  • Raw + processed dataset: stored once, billed once*

  • 4x less data per upload, verified with real-world workloads

*Requires Enterprise or Enterprise Plus plan

Traditional S3 Upload

8 / 8 chunks uploaded

VS

XET Deduplicated Upload

1 / 8 chunks uploaded

Gray = already stored · Purple = only the changed chunk

Pricing

Transparent, volume-based pricing

Simple per-TB pricing that scales with usage. Egress and CDN are included at no extra cost.

AWS S3
$23
Backblaze Overdrive
$15
HF Hub
$8–12
Base
$12 /TB/mo
Public repositories
$18 /TB/mo
Private repositories
50TB+ 20% off
$10 /TB/mo
Public repositories
$16 /TB/mo
Private repositories
200TB+ 25% off
$9 /TB/mo
Public repositories
$14 /TB/mo
Private repositories
500TB+ 33% off
$8 /TB/mo
Public repositories
$12 /TB/mo
Private repositories
Data Storage

Assemble training data at any scale

Pour raw data from every source into a single bucket: crawls, annotations, synthetic outputs, partner datasets. No git overhead, no commit queues, no file-count limits. When training begins, your data is already there, streamed to GPUs via the included CDN.

  • Immediate availability on upload, no queued commits

  • Batch API processes thousands of files in a single call

  • Raw + processed datasets with dedup = no double billing*

*Requires Enterprise or Enterprise Plus plan

crawl-2026-jan/

48 TB · 2.1M files

synced

annotations-v3/

12 TB · 890K files

synced

synthetic-pairs/

6 TB · 340K files

75%

Xet dedup: 66 TB stored → billed for 41 TB*

CDN

Built-in CDN for blazing fast access

Every bucket includes a CDN. Warm localized cache close to where you compute for ultra fast streaming and downloads. Egress is included up to a generous 8:1 ratio of your total storage.

  • Pre-warm cache in any cloud region you need

  • Our CDN is deployed inside GCP and AWS networks

  • Egress included up to 8:1 your storage

More providers coming soon
US-EASTEU-WESTASIASA-EAST
Coding Agents

Give your coding agents persistent storage

Coding agents run in ephemeral environments, but their outputs shouldn't vanish. Checkpoints, benchmark results, generated datasets: one hf sync command in your agent's bash tool is all it takes.

  • Pre-warmed CDN and no git overhead for fast reads and writes

  • Persist artifacts across ephemeral CI runs and terminal sessions

  • Install the official HF CLI skill and your agent knows every command

terminal - bash
# Agent creates a bucket and syncs experiment outputs
Bash(hf buckets create training-artifacts-v2)
└ Bucket created: hf://buckets/acme/training-artifacts-v2
Bash(hf buckets sync ./experiment_outputs/ hf://buckets/acme/training-artifacts-v2)
└ Sync plan: ./experiment_outputs/ → hf://buckets/acme/training-artifacts-v2
Uploads: 16
Downloads: 0
All done. Bucket acme/training-artifacts-v2 synced (14.3 GB total):
- 3 checkpoints (model.safetensors, optimizer.pt, config.json)
- 6 parquet shards (data/train/ and data/eval/)
- 1 training log
Bucket is live at hf://buckets/acme/training-artifacts-v2

Enterprise-grade security at every layer

AES-256 Encryption

End-to-end encryption at rest and in transit

Audit Logs

Full visibility into every access event

SSO & RBAC

Enterprise SSO with role-based access control

US & EU Regions

Choose where your data lives

SOC 2 Type II | GDPR Compliant