PyTorch
English
monkey
custom_code
Edit model card

This is the model repository of paper EDGE: Enhanced Grounded GUI Understanding with Enriched Multi-Granularity Synthetic Data.

The model is fine-tuned based on Monkey. In order to speed up the training, we also made some minor modifications:

  1. Instead of using the Lora Adapters in Monkey, the five patches of the raw image are stacked in an extra batch dimension and sent to the image encoder for processing at the same time.
  2. Inside the image encoder, we use flash attention instead of the manually implemented attention.
  3. Separate the step of reading the image from the forward propagation and make it a step of dataset preprocessing to speed up image reading using the Dataloader in pytorch.

The training dataset (i.e. all training QAs in .jsonl format, excluding images) is published in repository EDGE-Dataset.

The model training and inference scripts are published in anonymous repository EDGE.

Downloads last month
15
Inference API
Unable to determine this model's library. Check the docs .

Model tree for EDGEwww25/EDGE-Model

Finetuned
(1)
this model

Datasets used to train EDGEwww25/EDGE-Model