aPixelBytes-Pokemon / README.md
ffurfaro's picture
Update README.md
e8e25d6 verified
|
raw
history blame
1.97 kB
metadata
datasets:
  - ffurfaro/PixelBytes-PokemonAll
language: en
library_name: pytorch
license: mit
pipeline_tag: text-to-image
tags:
  - image-generation
  - text-generation
  - autio-generation
  - multimodal

PixelBytes: Unified Multimodal Generation

Welcome to the PixelBytes repository! This project features models designed to generate text, audio and images simultaneously, pixel by pixel, using a unified embedding. (only testing weight)

Overview

Key Concepts

  • Image Transformer: Generates images pixel by pixel.
  • Bi-Mamba+: A bidirectional model for time series prediction.
  • MambaByte: A selective state-space model without tokens.

The PixelByte model generates mixed sequences of text and images, handling transitions with line breaks and maintaining image dimension consistency.

Dataset

We use the PixelBytes-PokemonAll dataset, available on Hugging Face: PixelBytes-PokemonAll. It contains text and image sequences of Pokémon for training our model.

Models Trained

  • 3 LSTM Models: 2 Auto-regressive and 1 only predictive.

Citation

.. code-block:: bibtex

@misc{furfaro2024pixelbytes_project, author = {Furfaro, Fabien}, title = {PixelBytes: A Unified Multimodal Representation Learning Project}, year = {2024}, howpublished = { GitHub: \url{https://github.com/fabienfrfr/PixelBytes}, Models: \url{https://huggingface.co/ffurfaro/PixelBytes-Pokemon} and \url{https://huggingface.co/ffurfaro/aPixelBytes-Pokemon}, Datasets: \url{https://huggingface.co/datasets/ffurfaro/PixelBytes-Pokemon} and \url{https://huggingface.co/datasets/ffurfaro/PixelBytes-PokemonAll} }, note = {GitHub repository, Hugging Face Model Hub, and Datasets Hub} }


Thank you for exploring PixelBytes! We hope this model aids your multimodal generation projects.