|
--- |
|
datasets: |
|
- ffurfaro/PixelBytes-PokemonAll |
|
language: en |
|
library_name: pytorch |
|
license: mit |
|
pipeline_tag: text-to-image |
|
tags: |
|
- image-generation |
|
- text-generation |
|
- autio-generation |
|
- multimodal |
|
--- |
|
|
|
# PixelBytes: Unified Multimodal Generation |
|
|
|
Welcome to the **PixelBytes** repository! This project features models designed to generate text, audio and images simultaneously, pixel by pixel, using a unified embedding. (only testing weight) |
|
|
|
## Overview |
|
|
|
### Key Concepts |
|
- **Image Transformer**: Generates images pixel by pixel. |
|
- **Bi-Mamba+**: A bidirectional model for time series prediction. |
|
- **MambaByte**: A selective state-space model without tokens. |
|
|
|
The PixelByte model generates mixed sequences of text and images, handling transitions with line breaks and maintaining image dimension consistency. |
|
|
|
## Dataset |
|
|
|
We use the **PixelBytes-PokemonAll** dataset, available on Hugging Face: [PixelBytes-PokemonAll](https://huggingface.co/datasets/ffurfaro/PixelBytes-PokemonAll). It contains text and image sequences of Pokémon for training our model. |
|
|
|
## Models Trained |
|
|
|
- **3 LSTM Models**: 2 Auto-regressive and 1 only predictive. |
|
|
|
Citation |
|
-------- |
|
|
|
@misc{furfaro2024pixelbytes_project, |
|
author = {Furfaro, Fabien}, |
|
title = {PixelBytes: A Unified Multimodal Representation Learning Project}, |
|
year = {2024}, |
|
howpublished = {GitHub: https://github.com/fabienfrfr/PixelBytes}, |
|
note = {GitHub repository, Hugging Face Model Hub, and Datasets Hub} |
|
} |
|
|
|
|
|
--- |
|
|
|
Thank you for exploring **PixelBytes**! We hope this model aids your multimodal generation projects. |