Update README.md
Browse files
README.md
CHANGED
@@ -32,7 +32,8 @@ language model framework, eliminating the need for complex architectural modific
|
|
32 |
|
33 |
JanusFlow is a unified understanding and generation MLLM, which decouples visual encoding for multimodal understanding and generation, which is constructed based on DeepSeek-LLM-1.3b-base.
|
34 |
For multimodal understanding, it uses the [SigLIP-L](https://huggingface.co/timm/ViT-L-16-SigLIP-384) as the vision encoder, which supports 384 x 384 image input.
|
35 |
-
For image generation, JanusFlow uses rectified flow
|
|
|
36 |
|
37 |
<div align="center">
|
38 |
<img alt="image" src="arch.png" style="width:90%;">
|
|
|
32 |
|
33 |
JanusFlow is a unified understanding and generation MLLM, which decouples visual encoding for multimodal understanding and generation, which is constructed based on DeepSeek-LLM-1.3b-base.
|
34 |
For multimodal understanding, it uses the [SigLIP-L](https://huggingface.co/timm/ViT-L-16-SigLIP-384) as the vision encoder, which supports 384 x 384 image input.
|
35 |
+
For image generation, JanusFlow uses rectified flow and [SDXL-VAE](https://huggingface.co/stabilityai/sdxl-vae) to generate 384 x 384 images.
|
36 |
+
The provided checkpoint is the EMA checkpoint after pre-training and supervised fine-tuning.
|
37 |
|
38 |
<div align="center">
|
39 |
<img alt="image" src="arch.png" style="width:90%;">
|