File size: 2,410 Bytes
ccc98cc 72de723 7fa875a 72de723 ccc98cc 7fa875a ccc98cc 7fa875a ccc98cc 7fa875a 72de723 7fa875a ccc98cc 7fa875a ccc98cc 72de723 ccc98cc 7fa875a ccc98cc 7fa875a ccc98cc 7fa875a ccc98cc 7fa875a ccc98cc 7fa875a ccc98cc 7fa875a ccc98cc 72de723 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 |
---
library_name: transformers
license: mit
datasets:
- grascii/gregg-preanniversary-words
pipeline_tag: image-to-text
tags:
- gregg
- shorthand
- stenography
---
# Gregg Vision v0.2.1
Gregg Vision v0.2.1 generates a [Grascii](https://github.com/grascii/grascii) representation of a Gregg Shorthand form.
- **Model type:** Vision Encoder Text Decoder
- **License:** MIT
- **Repository:** [Github](https://github.com/grascii/gregg-vision-v0.2.1)
- **Demo:** [Grascii Search Space](https://huggingface.co/spaces/grascii/search)
## Uses
Given a grayscale image of a single shorthand form, Gregg Vision can be used to
generate its Grascii representation. When combined with [Grascii Search](https://github.com/grascii/grascii),
one can obtain possible English interpretations of the shorthand form.
## How to Get Started with the Model
Use the code below to get started with the model.
```python
from transformers import AutoModelForVision2Seq, AutoImageProcessor, AutoTokenizer
from PIL import Image
import numpy as np
model_id = "grascii/gregg-vision-v0.2.1"
model = AutoModelForVision2Seq.from_pretrained(model_id)
processor = AutoImageProcessor.from_pretrained(model_id)
tokenizer = AutoTokenizer.from_pretrained(model_id)
def generate_grascii(image: Image):
# convert image to a single channel
grayscale = image.convert("L")
# prepare processor input
images = np.array([grayscale])
# preprocess image
pixel_values = processor(images, return_tensors="pt").pixel_values
# generate token ids
ids = model.generate(pixel_values, max_new_tokens=12)[0]
# decode ids and return grascii
return tokenizer.decode(ids, skip_special_tokens=True)
```
Note: As of `transformers` v4.47.0, the model is incompatible with `pipeline` due to the
model's single channel image input.
## Technical Details
### Model Architecture and Objective
Gregg Vision v0.2.1 is a transformer model with a ViT encoder and a Roberta decoder.
For training, the model was warm-started using
[vit-small-patch16-224-single-channel](https://huggingface.co/grascii/vit-small-patch16-224-single-channel)
for the encoder and a randomly initialized Roberta network for the decoder.
### Training Data
Gregg Vision v0.2.1 was trained on the [gregg-preanniversary-words](https://huggingface.co/datasets/grascii/gregg-preanniversary-words) dataset.
### Training Hardware
Gregg Vision v0.2.1 was trained using 1xT4. |