Diffusers
Safetensors

FlexTok: Resampling Images into 1D Token Sequences of Flexible Length

Website | arXiv | GitHub | 🤗 Demo | BibTeX

Official implementation and pre-trained models for:
FlexTok: Resampling Images into 1D Token Sequences of Flexible Length, arXiv 2025
Roman Bachmann*, Jesse Allardice*, David Mizrahi*, Enrico Fini, Oğuzhan Fatih Kar, Elmira Amirloo, Alaaeldin El-Nouby, Amir Zamir, Afshin Dehghan

Installation

For install instructions, please see https://github.com/apple/ml-flextok.

Usage

To load the 8-channel VAE-GAN directly from HuggingFace Hub and autoencode a sample image, call:

from diffusers.models import AutoencoderKL
from flextok.utils.demo import imgs_from_urls

vae = AutoencoderKL.from_pretrained(
    'EPFL-VILAB/flextok_vae_c8', low_cpu_mem_usage=False
).eval()

# Load example images of shape (B, 3, H, W), normalized to [-1,1]
imgs = imgs_from_urls(urls=['https://storage.googleapis.com/flextok_site/nb_demo_images/0.png'])

# Autoencode with the VAE
latents = vae.encode(imgs).latent_dist.sample() # Shape (B, 8, H//8, W//8)
reconst = vae.decode(latents).sample # Shape (B, 3, H, W)

Citation

If you find this repository helpful, please consider citing our work:

@article{flextok,
    title={{FlexTok}: Resampling Images into 1D Token Sequences of Flexible Length},
    author={Roman Bachmann and Jesse Allardice and David Mizrahi and Enrico Fini and O{\u{g}}uzhan Fatih Kar and Elmira Amirloo and Alaaeldin El-Nouby and Amir Zamir and Afshin Dehghan},
    journal={arXiv 2025},
    year={2025},
}

License

The model weights in this repository are released under the Apple Model License for Research.

Downloads last month
18
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including EPFL-VILAB/flextok_vae_c8