outsu commited on
Commit
9ce6246
·
verified ·
1 Parent(s): 012f8b5

Metadata updated.

Browse files
Files changed (1) hide show
  1. README.md +79 -69
README.md CHANGED
@@ -1,69 +1,79 @@
1
- # TeLVE: Turkish efficient Language Vision Engine 🧿
2
- [![License: CC BY 4.0](https://img.shields.io/badge/License-CC%20BY%204.0-lightgrey.svg)](https://creativecommons.org/licenses/by/4.0/)
3
- [![Models: v1.0](https://img.shields.io/badge/Models-v1.0-blue)](https://huggingface.co/outsu/TeLVE)
4
- ## First Turkish VLM ever!
5
-
6
- TeLVE is the first Visual Language Model specifically designed for Turkish language understanding and image description generation. Built on Vision Transformer (ViT) and BERT pre-trained encoder architectures, it bridges the gap in Turkish visual-linguistic processing.
7
- No module named 'imagine'
8
- ![TeLVE logo](<teLVE_logo.png>)
9
-
10
- ## Model Description
11
-
12
- TeLVE combines:
13
- - 🖼️ Vision Transformer (ViT-base-patch16-224)
14
- - 📝 Turkish BERT (dbmdz/bert-base-turkish-cased)
15
- - 🔄 Cross-attention mechanism for vision-language fusion
16
-
17
- ### Version Logs
18
- - **TeLVE v1.0**: Trained on Unsplash Lite dataset
19
-
20
- ## Usage
21
-
22
- The model can be used in two ways:
23
-
24
- ### Inference (imagine.py)
25
- ```python
26
- # Generate captions for images
27
- python imagine.py
28
- ```
29
- This script:
30
- - Loads a trained TeLVE model
31
- - Takes images from `images` directory
32
- - Generates Turkish captions for each image
33
- - Outputs the results to console
34
-
35
- ### Training (main.py)
36
- Users can train their own models with ViT and BERT encoders.
37
- ```python
38
- # Train a new model
39
- python main.py
40
- ```
41
-
42
- This script:
43
- - Loads and preprocesses image-caption pairs
44
- - Initializes ViT and BERT encoders
45
- - Trains the combined model
46
- - Saves the model and tokenizer
47
-
48
-
49
- ## Performance
50
- Performance scores will be evaluated.
51
- <!--
52
- | Model Version | Dataset | BLEU-4 | METEOR | CIDEr |
53
- |--------------|---------|---------|---------|--------|
54
- | TeLVE v1.0 | Unsplash | *TBD* | *TBD* | *TBD* |
55
- | TeLVE v1.1 | Unsplash+Pexels | *TBD* | *TBD* | *TBD* |-->
56
-
57
- ## Citation
58
-
59
- ```bibtex
60
- @software{telve2024,
61
- author = {Öğüt Su Karagün},
62
- title = {TeLVE: Turkish efficient Language Vision Engine},
63
- year = {2024},
64
- url = {https://huggingface.co/outsu/TeLVE}
65
- }
66
- ```
67
-
68
- ## License
69
- This work is licensed under a [Creative Commons Attribution 4.0 International License](http://creativecommons.org/licenses/by/4.0/).
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-4.0
3
+ language:
4
+ - en
5
+ - tr
6
+ tags:
7
+ - VLM
8
+ - image2text
9
+ - lm
10
+ ---
11
+ # TeLVE: Turkish efficient Language Vision Engine 🧿
12
+ [![License: CC BY 4.0](https://img.shields.io/badge/License-CC%20BY%204.0-lightgrey.svg)](https://creativecommons.org/licenses/by/4.0/)
13
+ [![Models: v1.0](https://img.shields.io/badge/Models-v1.0-blue)](https://huggingface.co/outsu/TeLVE)
14
+ ## First Turkish VLM ever!
15
+
16
+ TeLVE is the first Visual Language Model specifically designed for Turkish language understanding and image description generation. Built on Vision Transformer (ViT) and BERT pre-trained encoder architectures, it bridges the gap in Turkish visual-linguistic processing.
17
+ No module named 'imagine'
18
+ ![TeLVE logo](<teLVE_logo.png>)
19
+
20
+ ## Model Description
21
+
22
+ TeLVE combines:
23
+ - 🖼️ Vision Transformer (ViT-base-patch16-224)
24
+ - 📝 Turkish BERT (dbmdz/bert-base-turkish-cased)
25
+ - 🔄 Cross-attention mechanism for vision-language fusion
26
+
27
+ ### Version Logs
28
+ - **TeLVE v1.0**: Trained on Unsplash Lite dataset
29
+
30
+ ## Usage
31
+
32
+ The model can be used in two ways:
33
+
34
+ ### Inference (imagine.py)
35
+ ```python
36
+ # Generate captions for images
37
+ python imagine.py
38
+ ```
39
+ This script:
40
+ - Loads a trained TeLVE model
41
+ - Takes images from `images` directory
42
+ - Generates Turkish captions for each image
43
+ - Outputs the results to console
44
+
45
+ ### Training (main.py)
46
+ Users can train their own models with ViT and BERT encoders.
47
+ ```python
48
+ # Train a new model
49
+ python main.py
50
+ ```
51
+
52
+ This script:
53
+ - Loads and preprocesses image-caption pairs
54
+ - Initializes ViT and BERT encoders
55
+ - Trains the combined model
56
+ - Saves the model and tokenizer
57
+
58
+
59
+ ## Performance
60
+ Performance scores will be evaluated.
61
+ <!--
62
+ | Model Version | Dataset | BLEU-4 | METEOR | CIDEr |
63
+ |--------------|---------|---------|---------|--------|
64
+ | TeLVE v1.0 | Unsplash | *TBD* | *TBD* | *TBD* |
65
+ | TeLVE v1.1 | Unsplash+Pexels | *TBD* | *TBD* | *TBD* |-->
66
+
67
+ ## Citation
68
+
69
+ ```bibtex
70
+ @software{telve2024,
71
+ author = {Öğüt Su Karagün},
72
+ title = {TeLVE: Turkish efficient Language Vision Engine},
73
+ year = {2024},
74
+ url = {https://huggingface.co/outsu/TeLVE}
75
+ }
76
+ ```
77
+
78
+ ## License
79
+ This work is licensed under a [Creative Commons Attribution 4.0 International License](http://creativecommons.org/licenses/by/4.0/).