ffurfaro
/

aPixelBytes-Pokemon

image-generation

text-generation

autio-generation

Model card Files Files and versions Community

aPixelBytes-Pokemon / README.md

ffurfaro's picture

Update README.md

1d4359b verified 6 months ago

|

1.57 kB

	---
	datasets:
	- ffurfaro/PixelBytes-PokemonAll
	language: en
	library_name: pytorch
	license: mit
	pipeline_tag: text-to-image
	tags:
	- image-generation
	- text-generation
	- autio-generation
	- multimodal
	---

	# PixelBytes: Unified Multimodal Generation

	Welcome to the PixelBytes repository! This project features models designed to generate text, audio and images simultaneously, pixel by pixel, using a unified embedding. (only testing weight)

	## Overview

	### Key Concepts
	- Image Transformer: Generates images pixel by pixel.
	- Bi-Mamba+: A bidirectional model for time series prediction.
	- MambaByte: A selective state-space model without tokens.

	The PixelByte model generates mixed sequences of text and images, handling transitions with line breaks and maintaining image dimension consistency.

	## Dataset

	We use the PixelBytes-PokemonAll dataset, available on Hugging Face: [PixelBytes-PokemonAll](https://huggingface.co/datasets/ffurfaro/PixelBytes-PokemonAll). It contains text and image sequences of Pokémon for training our model.

	## Models Trained

	- 3 LSTM Models: 2 Auto-regressive and 1 only predictive.

	Citation
	--------

	@misc{furfaro2024pixelbytes_project,
	author = {Furfaro, Fabien},
	title = {PixelBytes: A Unified Multimodal Representation Learning Project},
	year = {2024},
	howpublished = {GitHub: https://github.com/fabienfrfr/PixelBytes},
	note = {GitHub repository, Hugging Face Model Hub, and Datasets Hub}
	}


	---

	Thank you for exploring PixelBytes! We hope this model aids your multimodal generation projects.