--- library_name: transformers license: mit datasets: - grascii/gregg-preanniversary-words pipeline_tag: image-to-text tags: - gregg - shorthand - stenography --- # Gregg Vision v0.2.1 Gregg Vision v0.2.1 generates a [Grascii](https://github.com/grascii/grascii) representation of a Gregg Shorthand form. - **Model type:** Vision Encoder Text Decoder - **License:** MIT - **Repository:** [Github](https://github.com/grascii/gregg-vision-v0.2.1) - **Demo:** [Grascii Search Space](https://huggingface.co/spaces/grascii/search) ## Uses Given a grayscale image of a single shorthand form, Gregg Vision can be used to generate its Grascii representation. When combined with [Grascii Search](https://github.com/grascii/grascii), one can obtain possible English interpretations of the shorthand form. ## How to Get Started with the Model Use the code below to get started with the model. ```python from transformers import AutoModelForVision2Seq, AutoImageProcessor, AutoTokenizer from PIL import Image import numpy as np model_id = "grascii/gregg-vision-v0.2.1" model = AutoModelForVision2Seq.from_pretrained(model_id) processor = AutoImageProcessor.from_pretrained(model_id) tokenizer = AutoTokenizer.from_pretrained(model_id) def generate_grascii(image: Image): # convert image to a single channel grayscale = image.convert("L") # prepare processor input images = np.array([grayscale]) # preprocess image pixel_values = processor(images, return_tensors="pt").pixel_values # generate token ids ids = model.generate(pixel_values, max_new_tokens=12)[0] # decode ids and return grascii return tokenizer.decode(ids, skip_special_tokens=True) ``` Note: As of `transformers` v4.47.0, the model is incompatible with `pipeline` due to the model's single channel image input. ## Technical Details ### Model Architecture and Objective Gregg Vision v0.2.1 is a transformer model with a ViT encoder and a Roberta decoder. For training, the model was warm-started using [vit-small-patch16-224-single-channel](https://huggingface.co/grascii/vit-small-patch16-224-single-channel) for the encoder and a randomly initialized Roberta network for the decoder. ### Training Data Gregg Vision v0.2.1 was trained on the [gregg-preanniversary-words](https://huggingface.co/datasets/grascii/gregg-preanniversary-words) dataset. ### Training Hardware Gregg Vision v0.2.1 was trained using 1xT4.