|
--- |
|
license: cc-by-nc-sa-4.0 |
|
pipeline_tag: image-to-video |
|
tags: |
|
- turing |
|
- autonomous driving |
|
- video generation |
|
- world model |
|
--- |
|
|
|
# Terra |
|
|
|
**Terra** is a world model designed for autonomous driving and serves as a baseline model in th [ACT-Bench](https://github.com/turingmotors/ACT-Bench) framework. |
|
Terra generates video continuations based on short video clips of approximately three frames and trajectory instructions. |
|
A key feature of Terra is its **high adherence to trajectory instructions**, enabling accurate and reliable action-conditioned video generation. |
|
|
|
## Related Links |
|
|
|
For more technical details and discussions, please refer to: |
|
- **Paper:** https://arxiv.org/abs/2412.05337 |
|
- **Code:** https://github.com/turingmotors/ACT-Bench |
|
|
|
## How to use |
|
|
|
We have verified the execution on a machine equipped with a single NVIDIA H100 80GB GPU. However, we believe it should be possible to run the model on any machine equipped with an NVIDIA GPU with 16GB or more of VRAM. |
|
|
|
Terra consists of an Image Tokenizer, an Autoregressive Transformer, and a Video Refiner. Due to the complexity of setting up the Video Refiner, please refer to the [ACT-Bench repository](https://github.com/turingmotors/ACT-Bench) for detailed instructions. Here, we provide an example of generating video continuations using the Image Tokenizer and the Autoregressive Transformer, conditioned on image frames and a template trajectory. The resulting video quality might seem suboptimal as each frame is decoded individually. To improve the visual quality, you can use Video Refiner. |
|
|
|
### Install Packages |
|
|
|
We use [uv](https://docs.astral.sh/uv/) to manage python packages. If you don't have uv installed in your environment, please see the document of it. |
|
|
|
```shell |
|
$ git clone https://huggingface.co/turing-motors/Terra |
|
$ uv sync |
|
``` |
|
|
|
### Action-Conditioned Video Generation without Video Refiner |
|
|
|
```shell |
|
$ python inference.py |
|
``` |
|
|
|
This command generates a video using three image frames located in  and the `curving_to_left/curving_to_left_moderate` trajectory defined in the trajectory template file . |
|
|
|
You can find more details by referring to the  script. |
|
|
|
## Citation |
|
|
|
```bibtex |
|
@misc{arai2024actbench, |
|
title={ACT-Bench: Towards Action Controllable World Models for Autonomous Driving}, |
|
author={Hidehisa Arai and Keishi Ishihara and Tsubasa Takahashi and Yu Yamaguchi}, |
|
year={2024}, |
|
eprint={2412.05337}, |
|
archivePrefix={arXiv}, |
|
primaryClass={cs.CV}, |
|
url={https://arxiv.org/abs/2412.05337}, |
|
} |
|
``` |