update README
Browse files
README.md
CHANGED
@@ -17,21 +17,33 @@ A key feature of Terra is its **high adherence to trajectory instructions**, ena
|
|
17 |
## Related Links
|
18 |
|
19 |
For more technical details and discussions, please refer to:
|
20 |
-
- **Paper:**
|
21 |
- **Code:** https://github.com/turingmotors/ACT-Bench
|
22 |
-
- **Blog Post:** [運転版の"Sora"を作る:動画生成の世界モデルTerraの開発背景](https://zenn.dev/turing_motors/articles/6c0ddc10aae542) (ja) / [Create a driving version of "Sora"](https://medium.com/@hide1996/create-a-driving-version-of-sora-33cf4040937a) (en)
|
23 |
|
24 |
## How to use
|
25 |
|
26 |
We have verified the execution on a machine equipped with a single NVIDIA H100 80GB GPU. However, we believe it should be possible to run the model on any machine equipped with an NVIDIA GPU with 16GB or more of VRAM.
|
27 |
|
|
|
|
|
28 |
### Install Packages
|
29 |
|
|
|
30 |
|
|
|
|
|
|
|
|
|
31 |
|
32 |
### Action-Conditioned Video Generation without Video Refiner
|
33 |
|
34 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
35 |
|
36 |
## Citation
|
37 |
|
|
|
17 |
## Related Links
|
18 |
|
19 |
For more technical details and discussions, please refer to:
|
20 |
+
- **Paper:** https://arxiv.org/abs/2412.05337
|
21 |
- **Code:** https://github.com/turingmotors/ACT-Bench
|
|
|
22 |
|
23 |
## How to use
|
24 |
|
25 |
We have verified the execution on a machine equipped with a single NVIDIA H100 80GB GPU. However, we believe it should be possible to run the model on any machine equipped with an NVIDIA GPU with 16GB or more of VRAM.
|
26 |
|
27 |
+
Terra consists of an Image Tokenizer, an Autoregressive Transformer, and a Video Refiner. Due to the complexity of setting up the Video Refiner, please refer to the [ACT-Bench repository](https://github.com/turingmotors/ACT-Bench) for detailed instructions. Here, we provide an example of generating video continuations using the Image Tokenizer and the Autoregressive Transformer, conditioned on image frames and a template trajectory. The resulting video quality might seem suboptimal as each frame is decoded individually. To improve the visual quality, you can use Video Refiner.
|
28 |
+
|
29 |
### Install Packages
|
30 |
|
31 |
+
We use [uv](https://docs.astral.sh/uv/) to manage python packages. If you don't have uv installed in your environment, please see the document of it.
|
32 |
|
33 |
+
```shell
|
34 |
+
$ git clone https://huggingface.co/turing-motors/Terra
|
35 |
+
$ uv sync
|
36 |
+
```
|
37 |
|
38 |
### Action-Conditioned Video Generation without Video Refiner
|
39 |
|
40 |
+
```shell
|
41 |
+
$ python inference.py
|
42 |
+
```
|
43 |
+
|
44 |
+
This command generates a video using three image frames located in  and the `curving_to_left/curving_to_left_moderate` trajectory defined in the trajectory template file .
|
45 |
+
|
46 |
+
You can find more details by referring to the  script.
|
47 |
|
48 |
## Citation
|
49 |
|