File size: 5,176 Bytes
bd6df76
 
 
 
 
 
 
f71b80a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
bd6df76
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ba150fd
bd6df76
 
 
 
 
 
 
ba150fd
 
be9758b
 
 
ba150fd
 
 
be9758b
 
ba150fd
 
 
 
f955922
 
ba150fd
 
 
 
 
 
 
 
 
 
 
 
c6f45f8
ba150fd
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
bd6df76
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
---
tags:
- text-to-image
- lora
- diffusers
- template:diffusion-lora
widget:
- text: >-
    [TWO-VIEWS] This set of two images presents a scene from two different
    viewpoints. [IMAGE1] The first image shows a living room with a sofa set
    with cushions, side tables with table lamps, a flat screen television on a
    table, houseplants, wall hangings, electric lights, and a carpet on the
    floor. [IMAGE2] The second image shows the same room but in another
    viewpoint.
  output:
    url: images/livingroom_incontext_1.jpg
- text: >-
    [TWO-VIEWS] This set of two images presents a scene from two different
    viewpoints. [IMAGE1] The first image shows a living room with a sofa set
    with cushions, side tables with table lamps, a flat screen television on a
    table, houseplants, wall hangings, electric lights, and a carpet on the
    floor. [IMAGE2] The second image shows the same room but in another
    viewpoint.
  output:
    url: images/livingroom_incontext_0.jpg
- text: >-
    [TWO-VIEWS] This set of two images presents a scene from two different
    viewpoints. [IMAGE1] The first image shows a bedroom with a bed, dresser,
    and window. The bed is covered with a blanket and pillows, and there is a
    carpet on the floor. On the right side of the room, there are cupboards with
    drawers, a mirror, a lamp, and other objects. There is also a chair, a table
    with a lamp and other items, and another table with various items. The walls
    are adorned with photo frames, and the windows have curtains. Through the
    window, we can see trees outside. [IMAGE2] The second image shows the same
    room but in another viewpoint.
  output:
    url: images/bedroom_incontext_0.jpg
- text: >-
    [TWO-VIEWS] This set of two images presents a scene from two different
    viewpoints. [IMAGE1] The first image shows a bedroom with a bed, dresser,
    and window. The bed is covered with a blanket and pillows, and there is a
    carpet on the floor. On the right side of the room, there are cupboards with
    drawers, a mirror, a lamp, and other objects. There is also a chair, a table
    with a lamp and other items, and another table with various items. The walls
    are adorned with photo frames, and the windows have curtains. Through the
    window, we can see trees outside. [IMAGE2] The second image shows the same
    room but in another viewpoint.
  output:
    url: images/bedroom_incontext_1.jpg
base_model: black-forest-labs/FLUX.1-dev
instance_prompt: null
license: mit
---
# MultiView-InContext-Lora

<Gallery />

## Model description 

Inspired by [In-Context-LoRA](https:&#x2F;&#x2F;github.com&#x2F;ali-vilab&#x2F;In-Context-LoRA), this project aims to generate multi-view images of the same scene or object simultaneously. By using flux with the multiview-incontext-lora, we can divide the images into portions to obtain novel views.

> **_NOTE:_** This is a beta release of the model. The consistency between views may not be perfect, and the model might sometimes generate views that don't perfectly align or maintain exact object positions across viewpoints. I am working on improving the geometric consistency and spatial relationships between generated views.

## News

- 2024-11-25: Release beta v0.3 model checkpoint, the consistency between views has been improved a lot compared to the previous version.

## Roadmap

- [x] 🔄 Improve the consistency between the two-view images.
    - [ ] Add camera control to the prompt to manage the similarity between the two views.
- [ ] 4️⃣ Generate 4 views of a scene in a grid format.
- [ ] 🧸 Generate 4 canonical coordinates view points of a single object in a grid format.
- [ ] 🏛️ 3D reconstruction from multi-view images.

When applying the LoRA to the FluxInpaint Pipeline, I noticed significant degradation in consistency between the generated and input views. Therefore, I plan to also train the LoRA for the FluxFill model instead of the original Flux text-to-image model to improve performance.

## Inference

```python
import torch
from diffusers import FluxPipeline

pipeline = FluxPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-dev",
    torch_dtype=torch.bfloat16,
)
pipeline.load_lora_weights(
    "ysmao/multiview-incontext",
    weight_name="twoview-incontext-b03.safetensors",
)
pipeline.fuse_lora()

scene_prompt = "a living room with a sofa set with cushions, side tables with table lamps, a flat screen television on a table, houseplants, wall hangings, electric lights, and a carpet on the floor"
prompt = f"[TWO-VIEWS] This set of two images presents a scene from two different viewpoints. [IMAGE1] The first image shows {scene_prompt}. [IMAGE2] The second image shows the same room but in another viewpoint."
image_height = 576
image_width = 864
output = pipeline(
    prompt=prompt,
    height=int(image_height),
    width=int(image_width * 2),
    num_inference_steps=30,
    guidance_scale=3.5,
).images[0]

output.save("twoview-incontext-beta.png")
```

## Download model

Weights for this model are available in Safetensors format.

[Download](/ysmao/multiview-incontext/tree/main) them in the Files & versions tab.