Rename README.md to WILSON.md
Browse files![Uploading 1727541680956.jpgβ¦]()
- README.md β WILSON.md +10 -11
README.md β WILSON.md
RENAMED
@@ -2,25 +2,25 @@
|
|
2 |
pipeline_tag: visual-question-answering
|
3 |
---
|
4 |
|
5 |
-
##
|
6 |
### News
|
7 |
-
- [5/20]π₯ GPT-4V level multimodal model [**
|
8 |
-
- [4/11]π₯ [**
|
9 |
|
10 |
|
11 |
-
**MiniCPM-V** (i.e., OmniLMM-3B) is an efficient version with promising performance for deployment. The model is built based on SigLip-400M and [
|
12 |
|
13 |
- β‘οΈ **High Efficiency.**
|
14 |
|
15 |
-
|
16 |
|
17 |
- π₯ **Promising Performance.**
|
18 |
|
19 |
-
|
20 |
|
21 |
- π **Bilingual Support.**
|
22 |
|
23 |
-
|
24 |
|
25 |
### Evaluation
|
26 |
|
@@ -49,7 +49,7 @@ pipeline_tag: visual-question-answering
|
|
49 |
<td>- </td>
|
50 |
</tr>
|
51 |
<tr>
|
52 |
-
<td nowrap="nowrap" align="left">
|
53 |
<td align="right">3.0B</td>
|
54 |
<td>1289</td>
|
55 |
<td>59.6</td>
|
@@ -119,11 +119,10 @@ pipeline_tag: visual-question-answering
|
|
119 |
|
120 |
|
121 |
## Demo
|
122 |
-
Click here to try out the Demo of [
|
123 |
|
124 |
## Deployment on Mobile Phone
|
125 |
-
Currently MiniCPM-V (i.e., OmniLMM-3B) can be deployed on mobile phones with Android and Harmony operating systems. π Try it out [here](https://github.com/OpenBMB/mlc-
|
126 |
-
|
127 |
|
128 |
## Usage
|
129 |
Inference using Huggingface transformers on Nivdia GPUs or Mac with MPS (Apple silicon or AMD GPUs). Requirements tested on python 3.10οΌ
|
|
|
2 |
pipeline_tag: visual-question-answering
|
3 |
---
|
4 |
|
5 |
+
## TECH-WILSON
|
6 |
### News
|
7 |
+
- [5/20]π₯ GPT-4V level multimodal model [**tech-wilson 2.5**](https://huggingface.co/openbmb/tech-wilson-2_5) is out.
|
8 |
+
- [4/11]π₯ [**tech-wilson 2.0**](https://huggingface.co/openbmb/tech-wilson-2) is out.
|
9 |
|
10 |
|
11 |
+
**MiniCPM-V** (i.e., OmniLMM-3B) is an efficient version with promising performance for deployment. The model is built based on SigLip-400M and [tech-wilson-2.4B](https://github.com/OpenBMB/tech-wilson/), connected by a perceiver resampler. Notable features of OmniLMM-3B include:
|
12 |
|
13 |
- β‘οΈ **High Efficiency.**
|
14 |
|
15 |
+
tech-wilson can be **efficiently deployed on most GPU cards and personal computers**, and **even on end devices such as mobile phones**. In terms of visual encoding, we compress the image representations into 64 tokens via a perceiver resampler, which is significantly fewer than other LMMs based on MLP architecture (typically > 512 tokens). This allows OmniLMM-3B to operate with **much less memory cost and higher speed during inference**.
|
16 |
|
17 |
- π₯ **Promising Performance.**
|
18 |
|
19 |
+
tech-wilson achieves **state-of-the-art performance** on multiple benchmarks (including MMMU, MME, and MMbech, etc) among models with comparable sizes, surpassing existing LMMs built on Phi-2. It even **achieves comparable or better performance than the 9.6B Qwen-VL-Chat**.
|
20 |
|
21 |
- π **Bilingual Support.**
|
22 |
|
23 |
+
tech-wilson is **the first end-deployable LMM supporting bilingual multimodal interaction in English and Chinese**. This is achieved by generalizing multimodal capabilities across languages, a technique from the ICLR 2024 spotlight [paper](https://arxiv.org/abs/2308.12038).
|
24 |
|
25 |
### Evaluation
|
26 |
|
|
|
49 |
<td>- </td>
|
50 |
</tr>
|
51 |
<tr>
|
52 |
+
<td nowrap="nowrap" align="left">tech-wilson</td>
|
53 |
<td align="right">3.0B</td>
|
54 |
<td>1289</td>
|
55 |
<td>59.6</td>
|
|
|
119 |
|
120 |
|
121 |
## Demo
|
122 |
+
Click here to try out the Demo of [tech-wilson](http://120.92.209.146:80).
|
123 |
|
124 |
## Deployment on Mobile Phone
|
125 |
+
Currently MiniCPM-V (i.e., OmniLMM-3B) can be deployed on mobile phones with Android and Harmony operating systems. π Try it out [here](https://github.com/OpenBMB/mlc-tech-wilson).
|
|
|
126 |
|
127 |
## Usage
|
128 |
Inference using Huggingface transformers on Nivdia GPUs or Mac with MPS (Apple silicon or AMD GPUs). Requirements tested on python 3.10οΌ
|