deepseek-ai
/

JanusFlow-1.3B

Inference Endpoints

Model card Files Files and versions Community

XCLiu commited on 4 days ago

Commit

6510424

•

1 Parent(s): 73d4fdd

Update README.md

Files changed (1) hide show

README.md +2 -1

README.md CHANGED Viewed

@@ -32,7 +32,8 @@ language model framework, eliminating the need for complex architectural modific
 JanusFlow is a unified understanding and generation MLLM, which decouples visual encoding for multimodal understanding and generation, which is constructed based on DeepSeek-LLM-1.3b-base.
 For multimodal understanding, it uses the [SigLIP-L](https://huggingface.co/timm/ViT-L-16-SigLIP-384) as the vision encoder, which supports 384 x 384 image input.
-For image generation, JanusFlow uses rectified flow to generate in the latent space of [SDXL-VAE](https://huggingface.co/stabilityai/sdxl-vae).
 <div align="center">
 <img alt="image" src="arch.png" style="width:90%;">

 JanusFlow is a unified understanding and generation MLLM, which decouples visual encoding for multimodal understanding and generation, which is constructed based on DeepSeek-LLM-1.3b-base.
 For multimodal understanding, it uses the [SigLIP-L](https://huggingface.co/timm/ViT-L-16-SigLIP-384) as the vision encoder, which supports 384 x 384 image input.
+For image generation, JanusFlow uses rectified flow and [SDXL-VAE](https://huggingface.co/stabilityai/sdxl-vae) to generate 384 x 384 images.
+The provided checkpoint is the EMA checkpoint after pre-training and supervised fine-tuning.
 <div align="center">
 <img alt="image" src="arch.png" style="width:90%;">