BoltzmannEntropy commited on
Commit
d4e3940
·
1 Parent(s): a28cb80
Files changed (1) hide show
  1. README.md +2 -1
README.md CHANGED
@@ -8,13 +8,14 @@ pinned: false
8
  license: mit
9
  ---
10
 
 
11
  # VLM-Image-Analysis: A Vision-and-Language Modeling Framework
12
 
13
  Welcome to the Hugging Face Space (https://huggingface.co/spaces/BoltzmannEntropy/vlms) for VLM-Image-Analysis. This space showcases a cutting-edge framework that combines multiple Vision-Language Models (VLMs) and a Large Language Model (LLM) to provide comprehensive image analysis and captioning.
14
 
15
  <h1 align="center">
16
  <img src="static/image.jpg" width="50%"></a>
17
- <h6> (Adapted from wang2023allseeing: https://huggingface.co/datasets/OpenGVLab/CRPE?row=1) <h6>
18
  </h1>
19
 
20
  This repository contains the core code for a multi-model framework that enhances image interpretation through the combined power of several Vision-and-Language Modeling (VLM) systems. VLM-Image-Analysis delivers detailed, multi-faceted analyses of images by leveraging N cutting-edge VLM models, pre-trained on a wide range of datasets to detect diverse visual cues and linguistic patterns.
 
8
  license: mit
9
  ---
10
 
11
+
12
  # VLM-Image-Analysis: A Vision-and-Language Modeling Framework
13
 
14
  Welcome to the Hugging Face Space (https://huggingface.co/spaces/BoltzmannEntropy/vlms) for VLM-Image-Analysis. This space showcases a cutting-edge framework that combines multiple Vision-Language Models (VLMs) and a Large Language Model (LLM) to provide comprehensive image analysis and captioning.
15
 
16
  <h1 align="center">
17
  <img src="static/image.jpg" width="50%"></a>
18
+ <h6> (Source wang2023allseeing: https://huggingface.co/datasets/OpenGVLab/CRPE?row=1) <h6>
19
  </h1>
20
 
21
  This repository contains the core code for a multi-model framework that enhances image interpretation through the combined power of several Vision-and-Language Modeling (VLM) systems. VLM-Image-Analysis delivers detailed, multi-faceted analyses of images by leveraging N cutting-edge VLM models, pre-trained on a wide range of datasets to detect diverse visual cues and linguistic patterns.