File size: 1,969 Bytes
d94d560 6cfa998 d94d560 8888ed7 d94d560 6cfa998 d94d560 e8e25d6 d94d560 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 |
---
datasets:
- ffurfaro/PixelBytes-PokemonAll
language: en
library_name: pytorch
license: mit
pipeline_tag: text-to-image
tags:
- image-generation
- text-generation
- autio-generation
- multimodal
---
# PixelBytes: Unified Multimodal Generation
Welcome to the **PixelBytes** repository! This project features models designed to generate text, audio and images simultaneously, pixel by pixel, using a unified embedding. (only testing weight)
## Overview
### Key Concepts
- **Image Transformer**: Generates images pixel by pixel.
- **Bi-Mamba+**: A bidirectional model for time series prediction.
- **MambaByte**: A selective state-space model without tokens.
The PixelByte model generates mixed sequences of text and images, handling transitions with line breaks and maintaining image dimension consistency.
## Dataset
We use the **PixelBytes-PokemonAll** dataset, available on Hugging Face: [PixelBytes-PokemonAll](https://huggingface.co/datasets/ffurfaro/PixelBytes-PokemonAll). It contains text and image sequences of Pokémon for training our model.
## Models Trained
- **3 LSTM Models**: 2 Auto-regressive and 1 only predictive.
Citation
--------
.. code-block:: bibtex
@misc{furfaro2024pixelbytes_project,
author = {Furfaro, Fabien},
title = {PixelBytes: A Unified Multimodal Representation Learning Project},
year = {2024},
howpublished = {
GitHub: \url{https://github.com/fabienfrfr/PixelBytes},
Models: \url{https://huggingface.co/ffurfaro/PixelBytes-Pokemon} and \url{https://huggingface.co/ffurfaro/aPixelBytes-Pokemon},
Datasets: \url{https://huggingface.co/datasets/ffurfaro/PixelBytes-Pokemon} and \url{https://huggingface.co/datasets/ffurfaro/PixelBytes-PokemonAll}
},
note = {GitHub repository, Hugging Face Model Hub, and Datasets Hub}
}
---
Thank you for exploring **PixelBytes**! We hope this model aids your multimodal generation projects. |