Add image-to-image pipeline tag and library name
#1
by
nielsr
HF staff
- opened
README.md
CHANGED
@@ -1,5 +1,7 @@
|
|
1 |
---
|
2 |
license: mit
|
|
|
|
|
3 |
tags:
|
4 |
- lumos
|
5 |
- image to image
|
@@ -7,6 +9,7 @@ tags:
|
|
7 |
- novel view synthesis
|
8 |
- image to video
|
9 |
---
|
|
|
10 |
<p align="center">
|
11 |
<img src="asset/logo.gif" height=20>
|
12 |
</p>
|
@@ -41,7 +44,7 @@ Source code is available at https://github.com/xiaomabufei/lumos.
|
|
41 |
|
42 |
- **Developed by:** Lumos
|
43 |
- **Model type:** Diffusion-Transformer-based generative model
|
44 |
-
- **License:**
|
45 |
- **Model Description:** **Lumos-I2I** is a model designed for generating images based on image prompts. It utilizes a [Transformer Latent Diffusion architecture](https://arxiv.org/abs/2310.00426) and incorporates a fixed, pretrained vision encoder ([DINO](
|
46 |
https://dl.fbaipublicfiles.com/dino/dino_vitbase16_pretrain/dino_vitbase16_pretrain.pth)). **Lumos-T2I** is a model that can be used to generate images based on text prompts.
|
47 |
It is a [Transformer Latent Diffusion Model](https://arxiv.org/abs/2310.00426) that uses one fixed, pretrained text encoders ([T5](
|
|
|
1 |
---
|
2 |
license: mit
|
3 |
+
library_name: diffusers
|
4 |
+
pipeline_tag: image-to-image
|
5 |
tags:
|
6 |
- lumos
|
7 |
- image to image
|
|
|
9 |
- novel view synthesis
|
10 |
- image to video
|
11 |
---
|
12 |
+
|
13 |
<p align="center">
|
14 |
<img src="asset/logo.gif" height=20>
|
15 |
</p>
|
|
|
44 |
|
45 |
- **Developed by:** Lumos
|
46 |
- **Model type:** Diffusion-Transformer-based generative model
|
47 |
+
- **License:** MIT
|
48 |
- **Model Description:** **Lumos-I2I** is a model designed for generating images based on image prompts. It utilizes a [Transformer Latent Diffusion architecture](https://arxiv.org/abs/2310.00426) and incorporates a fixed, pretrained vision encoder ([DINO](
|
49 |
https://dl.fbaipublicfiles.com/dino/dino_vitbase16_pretrain/dino_vitbase16_pretrain.pth)). **Lumos-T2I** is a model that can be used to generate images based on text prompts.
|
50 |
It is a [Transformer Latent Diffusion Model](https://arxiv.org/abs/2310.00426) that uses one fixed, pretrained text encoders ([T5](
|