jrzhang's picture
Update README.md
c37ff5c verified
|
raw
history blame
3.26 kB
metadata
license: apache-2.0
language:
  - en
base_model:
  - Qwen/Qwen2.5-1.5B-Instruct
  - laion/CLIP-convnext_xxlarge-laion2B-s34B-b82K-augreg-soup
pipeline_tag: question-answering
metrics:
  - accuracy
library_name: transformers

Euclid: Supercharging Multimodal LLMs with Synthetic High-Fidelity Visual Descriptions

Model Card for Euclid-convnext-xxlarge (Version on 12/05/2024)

A multimodal large language models specifically trained for strong low-level geometric perception.

Model Details

Model Description

Euclid is trained on 1.6M synthetic geometry images with high-fidelity question-answer pairs using a curriculum learning approach.

It combines a ConvNeXt visual encoder with a Qwen-2.5 language model, connected through a 2-layer MLP multimodal connector.

Model Sources

Uses

The model is trained for precise low-level geometric perception tasks which is able to perform

  • Point-on-line detection
  • Point-on-circle detection
  • Angle classification
  • Length comparison
  • Geometric annotation understanding

Please refer to our repo for full input format.

Limitations and Applications

Our model is not designed to handle:

  • Comprehensive image understanding tasks
  • Advanced cognitive reasoning beyond geometric analysis

However, the model demonstrates strength in low-level visual perception.

This capability makes it potentially valuable for serving as a base model for specialized downstream fintuning including:

  • Robotic vision and automation systems
  • Medical imaging and diagnostic support
  • Industrial quality assurance and inspection
  • Geometric education and visualization tools

Example Usage

Clone our Euclid repo first, set up the environment, then run:

pip install -U "huggingface_hub[cli]"
huggingface-cli download --cache-dir $MODEL_PATH EuclidAI/Euclid-convnext-xxlarge
python euclid/eval/run_euclid_geo.py --model_path $MODEL_PATH --device cuda

Evaluation Results

Performance on Geoperception benchmark tasks:

Model POL POC ALC LHC PEP PRA EQL Overall
Random Baseline 0.43 2.63 59.92 51.36 0.25 0.00 0.02 16.37
Pixtral-12B 22.85 53.21 47.33 51.43 22.53 37.11 58.45 41.84
Gemini-1.5-Pro 24.42 69.80 57.96 79.05 39.60 77.59 52.27 57.24
EUCLID-ConvNeXt-Large 80.54 57.76 86.37 88.24 42.23 64.94 34.45 64.93
EUCLID-ConvNeXt-XXLarge 82.98 61.45 90.56 90.82 46.96 70.52 31.94 67.89

Citation

If you find Euclid useful for your research and applications, please cite using this BibTeX:

@article{zhang2024euclid,
  title={Euclid: Supercharging Multimodal LLMs with Synthetic High-Fidelity Visual Descriptions},
  author={Zhang, Jiarui and Liu, Ollie and Yu, Tianyu and Hu, Jinyi and Neiswanger, Willie},
  journal={arXiv preprint arXiv:2412.08737},
  year={2024}
}