---
library_name: transformers
license: mit
datasets:
- grascii/gregg-preanniversary-words
pipeline_tag: image-to-text
tags:
- gregg
- shorthand
- stenography
---

# Gregg Vision v0.2.1

Gregg Vision v0.2.1 generates a [Grascii](https://github.com/grascii/grascii) representation of a Gregg Shorthand form.

- **Model type:** Vision Encoder Text Decoder
- **License:** MIT
- **Repository:** [Github](https://github.com/grascii/gregg-vision-v0.2.1)
- **Demo:** [Grascii Search Space](https://huggingface.co/spaces/grascii/search)

## Uses

Given a grayscale image of a single shorthand form, Gregg Vision can be used to
generate its Grascii representation. When combined with [Grascii Search](https://github.com/grascii/grascii),
one can obtain possible English interpretations of the shorthand form.

## How to Get Started with the Model

Use the code below to get started with the model.

```python
from transformers import AutoModelForVision2Seq, AutoImageProcessor, AutoTokenizer
from PIL import Image
import numpy as np


model_id = "grascii/gregg-vision-v0.2.1"
model = AutoModelForVision2Seq.from_pretrained(model_id)
processor = AutoImageProcessor.from_pretrained(model_id)
tokenizer = AutoTokenizer.from_pretrained(model_id)


def generate_grascii(image: Image):
  # convert image to a single channel
  grayscale = image.convert("L")

  # prepare processor input
  images = np.array([grayscale])

  # preprocess image
  pixel_values = processor(images, return_tensors="pt").pixel_values

  # generate token ids
  ids = model.generate(pixel_values, max_new_tokens=12)[0]

  # decode ids and return grascii
  return tokenizer.decode(ids, skip_special_tokens=True)
```

Note: As of `transformers` v4.47.0, the model is incompatible with `pipeline` due to the
model's single channel image input.

## Technical Details

### Model Architecture and Objective

Gregg Vision v0.2.1 is a transformer model with a ViT encoder and a Roberta decoder.

For training, the model was warm-started using
[vit-small-patch16-224-single-channel](https://huggingface.co/grascii/vit-small-patch16-224-single-channel)
for the encoder and a randomly initialized Roberta network for the decoder.

### Training Data

Gregg Vision v0.2.1 was trained on the [gregg-preanniversary-words](https://huggingface.co/datasets/grascii/gregg-preanniversary-words) dataset.

### Training Hardware

Gregg Vision v0.2.1 was trained using 1xT4.