turing-motors
/

Terra

autonomous driving

video generation

Model card Files Files and versions Community

Terra / README.md

koukyo1994's picture

update README

3580aa5 verified 3 months ago

|

2.7 kB

	---
	license: cc-by-nc-sa-4.0
	pipeline_tag: image-to-video
	tags:
	- turing
	- autonomous driving
	- video generation
	- world model
	---

	# Terra

	Terra is a world model designed for autonomous driving and serves as a baseline model in th [ACT-Bench](https://github.com/turingmotors/ACT-Bench) framework.
	Terra generates video continuations based on short video clips of approximately three frames and trajectory instructions.
	A key feature of Terra is its high adherence to trajectory instructions, enabling accurate and reliable action-conditioned video generation.

	## Related Links

	For more technical details and discussions, please refer to:
	- Paper: https://arxiv.org/abs/2412.05337
	- Code: https://github.com/turingmotors/ACT-Bench

	## How to use

	We have verified the execution on a machine equipped with a single NVIDIA H100 80GB GPU. However, we believe it should be possible to run the model on any machine equipped with an NVIDIA GPU with 16GB or more of VRAM.

	Terra consists of an Image Tokenizer, an Autoregressive Transformer, and a Video Refiner. Due to the complexity of setting up the Video Refiner, please refer to the [ACT-Bench repository](https://github.com/turingmotors/ACT-Bench) for detailed instructions. Here, we provide an example of generating video continuations using the Image Tokenizer and the Autoregressive Transformer, conditioned on image frames and a template trajectory. The resulting video quality might seem suboptimal as each frame is decoded individually. To improve the visual quality, you can use Video Refiner.

	### Install Packages

	We use [uv](https://docs.astral.sh/uv/) to manage python packages. If you don't have uv installed in your environment, please see the document of it.

	```shell
	$ git clone https://huggingface.co/turing-motors/Terra
	$ uv sync
	```

	### Action-Conditioned Video Generation without Video Refiner

	```shell
	$ python inference.py
	```

	This command generates a video using three image frames located in ![`assets/conditioning_frames`](./assets/conditioning_frames/) and the `curving_to_left/curving_to_left_moderate` trajectory defined in the trajectory template file ![`assets/template_trajectory.json`](./assets/template_trajectory.json).

	You can find more details by referring to the ![`inference.py`](./inference.py) script.

	## Citation

	```bibtex
	@misc{arai2024actbench,
	title={ACT-Bench: Towards Action Controllable World Models for Autonomous Driving},
	author={Hidehisa Arai and Keishi Ishihara and Tsubasa Takahashi and Yu Yamaguchi},
	year={2024},
	eprint={2412.05337},
	archivePrefix={arXiv},
	primaryClass={cs.CV},
	url={https://arxiv.org/abs/2412.05337},
	}
	```