ashvardanian
commited on
Commit
•
a01951e
1
Parent(s):
e2c6da8
Update README.md
Browse files
README.md
CHANGED
@@ -18,10 +18,10 @@ In Python, JavaScript, and Swift<br/>
|
|
18 |
---
|
19 |
|
20 |
The `uform3-image-text-english-small` UForm model is a tiny vision and English language encoder, mapping them into a shared vector space.
|
21 |
-
This model is made of:
|
22 |
|
23 |
-
* Text encoder: 4-layer BERT.
|
24 |
-
* Visual encoder: ViT-S/16 for images of
|
25 |
|
26 |
Unlike most CLIP-like multomodal models, this model shares 2 layers between the text and visual encoder to allow for more data- and parameter-efficient training.
|
27 |
Also unlike most models, UForm provides checkpoints compatible with PyTorch, ONNX, and CoreML, covering the absolute majority of AI-capable devices, with pre-quantized weights and inference code.
|
|
|
18 |
---
|
19 |
|
20 |
The `uform3-image-text-english-small` UForm model is a tiny vision and English language encoder, mapping them into a shared vector space.
|
21 |
+
This model produces up to __256-dimensional embeddings__ and is made of:
|
22 |
|
23 |
+
* Text encoder: 4-layer BERT for up to 64 input tokens.
|
24 |
+
* Visual encoder: ViT-S/16 for images of 224 x 224 resolution.
|
25 |
|
26 |
Unlike most CLIP-like multomodal models, this model shares 2 layers between the text and visual encoder to allow for more data- and parameter-efficient training.
|
27 |
Also unlike most models, UForm provides checkpoints compatible with PyTorch, ONNX, and CoreML, covering the absolute majority of AI-capable devices, with pre-quantized weights and inference code.
|