bazike commited on
Commit
1aef125
Β·
verified Β·
1 Parent(s): b1c9acf

Rename README.md to WILSON.md

Browse files

![Uploading 1727541680956.jpg…]()

Files changed (1) hide show
  1. README.md β†’ WILSON.md +10 -11
README.md β†’ WILSON.md RENAMED
@@ -2,25 +2,25 @@
2
  pipeline_tag: visual-question-answering
3
  ---
4
 
5
- ## MiniCPM-V
6
  ### News
7
- - [5/20]πŸ”₯ GPT-4V level multimodal model [**MiniCPM-Llama3-V 2.5**](https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5) is out.
8
- - [4/11]πŸ”₯ [**MiniCPM-V 2.0**](https://huggingface.co/openbmb/MiniCPM-V-2) is out.
9
 
10
 
11
- **MiniCPM-V** (i.e., OmniLMM-3B) is an efficient version with promising performance for deployment. The model is built based on SigLip-400M and [MiniCPM-2.4B](https://github.com/OpenBMB/MiniCPM/), connected by a perceiver resampler. Notable features of OmniLMM-3B include:
12
 
13
  - ⚑️ **High Efficiency.**
14
 
15
- MiniCPM-V can be **efficiently deployed on most GPU cards and personal computers**, and **even on end devices such as mobile phones**. In terms of visual encoding, we compress the image representations into 64 tokens via a perceiver resampler, which is significantly fewer than other LMMs based on MLP architecture (typically > 512 tokens). This allows OmniLMM-3B to operate with **much less memory cost and higher speed during inference**.
16
 
17
  - πŸ”₯ **Promising Performance.**
18
 
19
- MiniCPM-V achieves **state-of-the-art performance** on multiple benchmarks (including MMMU, MME, and MMbech, etc) among models with comparable sizes, surpassing existing LMMs built on Phi-2. It even **achieves comparable or better performance than the 9.6B Qwen-VL-Chat**.
20
 
21
  - πŸ™Œ **Bilingual Support.**
22
 
23
- MiniCPM-V is **the first end-deployable LMM supporting bilingual multimodal interaction in English and Chinese**. This is achieved by generalizing multimodal capabilities across languages, a technique from the ICLR 2024 spotlight [paper](https://arxiv.org/abs/2308.12038).
24
 
25
  ### Evaluation
26
 
@@ -49,7 +49,7 @@ pipeline_tag: visual-question-answering
49
  <td>- </td>
50
  </tr>
51
  <tr>
52
- <td nowrap="nowrap" align="left">MobileVLM</td>
53
  <td align="right">3.0B</td>
54
  <td>1289</td>
55
  <td>59.6</td>
@@ -119,11 +119,10 @@ pipeline_tag: visual-question-answering
119
 
120
 
121
  ## Demo
122
- Click here to try out the Demo of [MiniCPM-V](http://120.92.209.146:80).
123
 
124
  ## Deployment on Mobile Phone
125
- Currently MiniCPM-V (i.e., OmniLMM-3B) can be deployed on mobile phones with Android and Harmony operating systems. πŸš€ Try it out [here](https://github.com/OpenBMB/mlc-MiniCPM).
126
-
127
 
128
  ## Usage
129
  Inference using Huggingface transformers on Nivdia GPUs or Mac with MPS (Apple silicon or AMD GPUs). Requirements tested on python 3.10:
 
2
  pipeline_tag: visual-question-answering
3
  ---
4
 
5
+ ## TECH-WILSON
6
  ### News
7
+ - [5/20]πŸ”₯ GPT-4V level multimodal model [**tech-wilson 2.5**](https://huggingface.co/openbmb/tech-wilson-2_5) is out.
8
+ - [4/11]πŸ”₯ [**tech-wilson 2.0**](https://huggingface.co/openbmb/tech-wilson-2) is out.
9
 
10
 
11
+ **MiniCPM-V** (i.e., OmniLMM-3B) is an efficient version with promising performance for deployment. The model is built based on SigLip-400M and [tech-wilson-2.4B](https://github.com/OpenBMB/tech-wilson/), connected by a perceiver resampler. Notable features of OmniLMM-3B include:
12
 
13
  - ⚑️ **High Efficiency.**
14
 
15
+ tech-wilson can be **efficiently deployed on most GPU cards and personal computers**, and **even on end devices such as mobile phones**. In terms of visual encoding, we compress the image representations into 64 tokens via a perceiver resampler, which is significantly fewer than other LMMs based on MLP architecture (typically > 512 tokens). This allows OmniLMM-3B to operate with **much less memory cost and higher speed during inference**.
16
 
17
  - πŸ”₯ **Promising Performance.**
18
 
19
+ tech-wilson achieves **state-of-the-art performance** on multiple benchmarks (including MMMU, MME, and MMbech, etc) among models with comparable sizes, surpassing existing LMMs built on Phi-2. It even **achieves comparable or better performance than the 9.6B Qwen-VL-Chat**.
20
 
21
  - πŸ™Œ **Bilingual Support.**
22
 
23
+ tech-wilson is **the first end-deployable LMM supporting bilingual multimodal interaction in English and Chinese**. This is achieved by generalizing multimodal capabilities across languages, a technique from the ICLR 2024 spotlight [paper](https://arxiv.org/abs/2308.12038).
24
 
25
  ### Evaluation
26
 
 
49
  <td>- </td>
50
  </tr>
51
  <tr>
52
+ <td nowrap="nowrap" align="left">tech-wilson</td>
53
  <td align="right">3.0B</td>
54
  <td>1289</td>
55
  <td>59.6</td>
 
119
 
120
 
121
  ## Demo
122
+ Click here to try out the Demo of [tech-wilson](http://120.92.209.146:80).
123
 
124
  ## Deployment on Mobile Phone
125
+ Currently MiniCPM-V (i.e., OmniLMM-3B) can be deployed on mobile phones with Android and Harmony operating systems. πŸš€ Try it out [here](https://github.com/OpenBMB/mlc-tech-wilson).
 
126
 
127
  ## Usage
128
  Inference using Huggingface transformers on Nivdia GPUs or Mac with MPS (Apple silicon or AMD GPUs). Requirements tested on python 3.10: