--- library_name: keras license: mit language: - en pipeline_tag: image-to-image tags: - art - pixel_art - character_sprite - missing_data_imputation - image_to_image --- ## Model description The MDIGAN-Characters model was proposed in SBGames 2024 ([paper on ArXiv][paper-arxiv], [page][paper-page] [demo][paper-demo]) It is a model trained for the task of generating characters in a missing pose: for instance, given images of a character facing back, left, and right, it can generate the character facing front (missing data imputation task). ![](https://i.imgur.com/s5ONl9Q.png) The model's architecture is based on [CollaGAN][paper-collagan]'s, a model trained to impute images in missing domains in a multi-domain scenario. In our case, the domains are the sides a character might face, i.e., back, left, front, and right. We tested providing 3 images to the model, to generate the missing one. But we also evaluated the quality of the generated images when the model receives 2 or 1 input image. The inputs to the model are the target (missing) domain and 4 image-like tensors with size 64x64x4 in the order back, left, front, and right. The input images should be floating point tensors in the range of [-1, 1]. In place of the missing image(s), we must provide a tensor with shape 64x64x4 filled with zeros. [paper-collagan]: https://www.computer.org/csdl/proceedings-article/cvpr/2019/329300c482/1gys5gg67QY [paper-arxiv]: https://arxiv.org/abs/2409.10721 [paper-page]: https://fegemo.github.io/mdigan-characters [paper-demo]: https://fegemo.github.io/interactive-generator ## Intended uses & limitations This can be used for research purposes only. The quality of the generated images vary a lot, and a post-processing step to quantize the colors of the generated image to the intended palette is benefitial. ## Training and evaluation data The model was trained with the [PAC dataset][pac], which features 12,074 paired images of pixel art characters in 4 directions: back, left, front, and right. Compared to StarGAN and Pix2Pix-based baselines, the MDIGAN-Characters model yielded much better images when it received 3 images, and still good images when only 2 are provided. [pac]: https://github.com/plucksquire/pac/