XCLiu commited on
Commit
6510424
1 Parent(s): 73d4fdd

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -1
README.md CHANGED
@@ -32,7 +32,8 @@ language model framework, eliminating the need for complex architectural modific
32
 
33
  JanusFlow is a unified understanding and generation MLLM, which decouples visual encoding for multimodal understanding and generation, which is constructed based on DeepSeek-LLM-1.3b-base.
34
  For multimodal understanding, it uses the [SigLIP-L](https://huggingface.co/timm/ViT-L-16-SigLIP-384) as the vision encoder, which supports 384 x 384 image input.
35
- For image generation, JanusFlow uses rectified flow to generate in the latent space of [SDXL-VAE](https://huggingface.co/stabilityai/sdxl-vae).
 
36
 
37
  <div align="center">
38
  <img alt="image" src="arch.png" style="width:90%;">
 
32
 
33
  JanusFlow is a unified understanding and generation MLLM, which decouples visual encoding for multimodal understanding and generation, which is constructed based on DeepSeek-LLM-1.3b-base.
34
  For multimodal understanding, it uses the [SigLIP-L](https://huggingface.co/timm/ViT-L-16-SigLIP-384) as the vision encoder, which supports 384 x 384 image input.
35
+ For image generation, JanusFlow uses rectified flow and [SDXL-VAE](https://huggingface.co/stabilityai/sdxl-vae) to generate 384 x 384 images.
36
+ The provided checkpoint is the EMA checkpoint after pre-training and supervised fine-tuning.
37
 
38
  <div align="center">
39
  <img alt="image" src="arch.png" style="width:90%;">