5CD-AI
/

ColVintern-1B-v1

@@ -17,17 +17,16 @@ tags:
   <img src="Vintern3B-logo.jpg" width="700"/>
 </div>
-## Vintern-3B-beta 🇻🇳 ❄️ - The LLaVA 🌋 Challenger
-**What's new in Vintern-3B-beta!**
-- **We successfully reproduced the training process of InternVL from scratch.**
-- The model is the result of integrating [Qwen/Qwen2.5-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct) and [InternViT-300M-448px](https://huggingface.co/OpenGVLab/InternViT-300M-448px) through an MLP layer.
-- Trained with more than 10 Milion Vietnamese QnAs, Descriptions, and 10% English, Chinese Data from [OpenGVLab/InternVL-Chat-V1-2-SFT-Data](https://huggingface.co/datasets/OpenGVLab/InternVL-Chat-V1-2-SFT-Data).
 ## Colpali Benchmarks
-We tested on the **ViDoRe** benchmark from the Colpali paper. The **TabF** and **Shift** test datasets were not used because they are in French. We plan to expand to multiple languages in the near future.
 |                              | ArxivQ | DocQ | InfoQ | TATQ | AI   | Energy | Gov.  | Health. | Avg.   |
 |:------------------------------:|:--------:|:------:|:-------:|:------:|:------:|:--------:|:-------:|:---------:|:--------:|
@@ -52,6 +51,7 @@ We tested on the **ViDoRe** benchmark from the Colpali paper. The **TabF** and *
 | **Ours**                               |        |      |       |      |      |        |       |         |        |
 | - ColVintern-1B (+Late Inter.)  | 71.6   | 48.3 | **84.6**  | 59.6 | 92.9 | 88.7 | 89.4 | **95.2** | 78.8 |
 ## Vietnamese Benchmarks
@@ -75,71 +75,6 @@ Assistant: Bức ảnh là một giao diện ứng dụng đặt hàng thức ă
 ```
-<div align="center">
-  <img src="ex_1.jpg" width="400"/>
-</div>
-```
-User: <image>
-Liệt kê toàn bộ bài thơ có trong ảnh.
-Assistant: Đi khắp thế gian không ai tốt bằng mẹ Gánh nặng cuộc đời không ai khổ bằng cha
-```
-<div align="center">
-  <img src="ex_2.jpg" width="400"/>
-</div>
-```
-User: <image>
-Liệt kê toàn bộ bài thơ có trong ảnh.
-Assistant: Bài thơ có nội dung:
-- Mẹ như một ngọn hải đăng
-- Như ông mặt trời, như ông mặt trăng
-- Ngày ngày vất vả, tảo tần
-- Chăm lo con cái, làm việc siêng năng.
-```
-<div align="center">
-  <img src="ex_3.png" width="400"/>
-</div>
-```
-User: <image>
-Mô tả hình ảnh chi tiết.
-Assistant: Bức ảnh chụp một bảng đen với các công thức toán học được viết bằng phấn trắng. Ở góc trên bên trái, có dòng chữ "20/19/SGK" được viết bằng màu vàng nhạt. Tiếp theo là phần a) với hai phương trình:
-3x + y = 3
-2x - y = 7
-Dưới hai phương trình này là mũi tên chỉ sang phải và hai phương trình mới:
-5x = 10
-3x + y = 3
-3.2 + y = 3
-Sau đó là dòng chữ "x = 2"
-y = -3
-Cuối cùng là dòng chữ "Vậy hpt đã cho có no! (x; y) = (2; -3)"
-```
-<div align="center">
-  <img src="ex_4.jpg" width="400"/>
-</div>
-```
-User: <image>
-Mô tả hình ảnh chi tiết.
-Assistant: Bức ảnh minh họa quá trình xử lý hình ảnh bằng mạng lưới thần kinh nhân tạo (CNN). Bức ảnh được chia thành 3 phần chính:
-1. **Phần đầu tiên:** Hiển thị một hình ảnh của một chiếc xe hơi màu trắng với dòng chữ "input image" ở bên dưới.
-2. **Phần thứ hai:** Hiển thị 6 khối vuông, mỗi khối đại diện cho một lớp trong mạng CNN. Các lớp được đánh số thứ tự từ trái sang phải:
-    * Convolutional layer
-    * Pooling layer
-    * Convolutional layer
-    * Pooling layer
-    * Fully-connected layer
-    * Predicted class
-3. **Phần thứ ba:** Hiển thị một hình ảnh với dòng chữ "car" ở bên dưới.
-Một mũi tên màu đen nối các khối vuông với nhau, thể hiện quá trình truyền dữ liệu giữa các lớp.
-```
 ## Quickstart

   <img src="Vintern3B-logo.jpg" width="700"/>
 </div>
+## ColVintern-1B-v1 🇻🇳 ❄️ - Colpali version for Vietnamese.
+**What's new in ColVintern-1B-v1!**
+- We coded and successfully trained the **Colpali pipeline for Vintern**. The model supports RAG by extracting vector embeddings for questions and images containing related information.
+- This is the first experimental version, trained on the [**Colpali dataset**](https://huggingface.co/datasets/vidore/colpali_train_set) for English and **2%** of the image-based question-answer pairs we have for Vietnamese.
+- The model achieves results nearly equivalent to Colpali version 1, with strong support for Vietnamese texts and only 1 billion parameters compared to current Colpali models.
 ## Colpali Benchmarks
+We tested on the [**ViDoRe benchmark**](https://huggingface.co/collections/vidore/vidore-benchmark-667173f98e70a1c0fa4db00d) from the Colpali paper. The **TabF** and **Shift** test datasets were not used because they are in French. We plan to expand to multiple languages in the near future.
 |                              | ArxivQ | DocQ | InfoQ | TATQ | AI   | Energy | Gov.  | Health. | Avg.   |
 |:------------------------------:|:--------:|:------:|:-------:|:------:|:------:|:--------:|:-------:|:---------:|:--------:|
 | **Ours**                               |        |      |       |      |      |        |       |         |        |
 | - ColVintern-1B (+Late Inter.)  | 71.6   | 48.3 | **84.6**  | 59.6 | 92.9 | 88.7 | 89.4 | **95.2** | 78.8 |
+We are expanding the training dataset for upcoming versions, including adding hard negative mining techniques, increasing GPU VRAM, etc., to achieve better results.
 ## Vietnamese Benchmarks
 ```
 ## Quickstart