metadata
license: apache-2.0
datasets:
- ILSVRC/imagenet-1k
model-index:
- name: MaskBit-Tokenizer-16bits
results:
- task:
type: image-generation
dataset:
name: ILSVRC/imagenet-1k
type: ILSVRC/imagenet-1k
metrics:
- name: rFID
type: rFID
value: 1.29
- name: InceptionScore
type: InceptionScore
value: 193.6
- name: LPIPS
type: LPIPS
value: 0.278
- name: PSNR
type: PSNR
value: 21.8
- name: SSIM
type: SSIM
value: 0.58
- name: CodebookUsage
type: CodebookUsage
value: 1
This model is the MaskBit tokenizer with a vocabulary size of 16bits. It uses a downsampling factor of 16 and is trained on ImageNet for images of resolution 256.
You can find more details on the project page and in the paper.