outsu commited on
Commit
1f07473
·
verified ·
1 Parent(s): 206ebe3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +80 -80
README.md CHANGED
@@ -1,80 +1,80 @@
1
- ---
2
- license: cc-by-4.0
3
- language:
4
- - en
5
- - tr
6
- tags:
7
- - VLM
8
- - image2text
9
- - lm
10
- ---
11
- # TeLVE: Turkish efficient Language Vision Engine 🧿
12
- [![License: CC BY 4.0](https://img.shields.io/badge/License-CC%20BY%204.0-lightgrey.svg)](https://creativecommons.org/licenses/by/4.0/)
13
- [![Models: v1.0](https://img.shields.io/badge/Models-v1.0%2c%20v1.0dep-blue)](https://huggingface.co/outsu/TeLVE)
14
- ## First Turkish VLM ever!
15
-
16
- TeLVE is the first Visual Language Model specifically designed for Turkish language understanding and image description generation. Built on Vision Transformer (ViT) and BERT pre-trained encoder architectures, it bridges the gap in Turkish visual-linguistic processing.
17
- No module named 'imagine'
18
- ![TeLVE logo](<teLVE_logo.png>)
19
-
20
- ## Model Description
21
-
22
- TeLVE combines:
23
- - 🖼️ Vision Transformer (ViT-base-patch16-224)
24
- - 📝 Turkish BERT (dbmdz/bert-base-turkish-cased)
25
- - 🔄 Cross-attention mechanism for vision-language fusion
26
-
27
- ### Version Logs
28
- - **TeLVE v1.0**: Trained on Unsplash Lite dataset
29
- - **TeLVE v1.0dep**: Dataset enhanced with selective images from Pexels images, the encoder problem with letter "ü" was fixed. *(Deprecated, performance was decreased because of dataset addressing problem. Not recommended to use.)*
30
-
31
- ## Usage
32
-
33
- The model can be used in two ways:
34
-
35
- ### Inference (imagine.py)
36
- ```python
37
- # Generate captions for images
38
- python imagine.py
39
- ```
40
- This script:
41
- - Loads a trained TeLVE model
42
- - Takes images from `images` directory
43
- - Generates Turkish captions for each image
44
- - Outputs the results to console
45
-
46
- ### Training (main.py)
47
- Users can train their own models with ViT and BERT encoders.
48
- ```python
49
- # Train a new model
50
- python main.py
51
- ```
52
-
53
- This script:
54
- - Loads and preprocesses image-caption pairs
55
- - Initializes ViT and BERT encoders
56
- - Trains the combined model
57
- - Saves the model and tokenizer
58
-
59
-
60
- ## Performance
61
- Performance scores will be evaluated.
62
- <!--
63
- | Model Version | Dataset | BLEU-4 | METEOR | CIDEr |
64
- |--------------|---------|---------|---------|--------|
65
- | TeLVE v1.0 | Unsplash | *TBD* | *TBD* | *TBD* |
66
- | TeLVE v1.1 | Unsplash+Pexels | *TBD* | *TBD* | *TBD* |-->
67
-
68
- ## Citation
69
-
70
- ```bibtex
71
- @software{telve2024,
72
- author = {Öğüt Su Karagün},
73
- title = {TeLVE: Turkish efficient Language Vision Engine},
74
- year = {2024},
75
- url = {https://huggingface.co/outsu/TeLVE}
76
- }
77
- ```
78
-
79
- ## License
80
- This work is licensed under a [Creative Commons Attribution 4.0 International License](http://creativecommons.org/licenses/by/4.0/).
 
1
+ ---
2
+ license: cc-by-4.0
3
+ language:
4
+ - en
5
+ - tr
6
+ tags:
7
+ - VLM
8
+ - image2text
9
+ - lm
10
+ ---
11
+ # TeLVE: Turkish efficient Language Vision Engine 🧿
12
+ [![License: CC BY 4.0](https://img.shields.io/badge/License-CC%20BY%204.0-lightgrey.svg)](https://creativecommons.org/licenses/by/4.0/)
13
+ [![Models: v1.0](https://img.shields.io/badge/Models-v1.0%2c%20v1.0dep-blue)](https://huggingface.co/outsu/TeLVE)
14
+ ## First Turkish VLM ever!
15
+
16
+ TeLVE is the first Visual Language Model specifically designed for Turkish language understanding and image description generation. Built on Vision Transformer (ViT) and BERT pre-trained encoder architectures, it bridges the gap in Turkish visual-linguistic processing.
17
+ No module named 'imagine'
18
+ ![TeLVE logo](<teLVE_logo.png>)
19
+
20
+ ## Model Description
21
+
22
+ TeLVE combines:
23
+ - 🖼️ Vision Transformer (ViT-base-patch16-224)
24
+ - 📝 Turkish BERT (dbmdz/bert-base-turkish-cased)
25
+ - 🔄 Cross-attention mechanism for vision-language fusion
26
+
27
+ ### Version Logs
28
+ - **TeLVE v1.0**: Trained on Unsplash Lite dataset
29
+ - **TeLVE v1.0dep**: Dataset enhanced with selective images from Pexels images, the encoder problem with letter "ü" was fixed. *(Deprecated, performance was decreased because of dataset addressing problem. Not recommended to use.)*
30
+
31
+ ## Usage
32
+
33
+ The model can be used in two ways:
34
+
35
+ ### Inference (imagine.py)
36
+ ```python
37
+ # Generate captions for images
38
+ python imagine.py
39
+ ```
40
+ This script:
41
+ - Loads a trained TeLVE model
42
+ - Takes images from `images` directory
43
+ - Generates Turkish captions for each image
44
+ - Outputs the results to console
45
+
46
+ ### Training (main.py)
47
+ Users can train their own models with ViT and BERT encoders.
48
+ ```python
49
+ # Train a new model
50
+ python main.py
51
+ ```
52
+
53
+ This script:
54
+ - Loads and preprocesses image-caption pairs
55
+ - Initializes ViT and BERT encoders
56
+ - Trains the combined model
57
+ - Saves the model and tokenizer
58
+
59
+
60
+ ## Performance
61
+ Performance scores will be evaluated.
62
+ <!--
63
+ | Model Version | Dataset | BLEU-4 | METEOR | CIDEr |
64
+ |--------------|---------|---------|---------|--------|
65
+ | TeLVE v1.0 | Unsplash | *TBD* | *TBD* | *TBD* |
66
+ | TeLVE v1.1 | Unsplash+Pexels | *TBD* | *TBD* | *TBD* |-->
67
+
68
+ ## Citation
69
+
70
+ ```bibtex
71
+ @software{telve2024,
72
+ author = {Öğüt Su Karagün},
73
+ title = {TeLVE: Turkish efficient Language Vision Engine},
74
+ year = {2024},
75
+ url = {https://huggingface.co/outsu/TeLVE}
76
+ }
77
+ ```
78
+
79
+ ## License
80
+ <p xmlns:cc="http://creativecommons.org/ns#" xmlns:dct="http://purl.org/dc/terms/"><a property="dct:title" rel="cc:attributionURL" href="https://huggingface.co/outsu/TeLVE">TeLVE</a> by <a rel="cc:attributionURL dct:creator" property="cc:attributionName" href="https://outsu.github.io">Öğüt Su Karagün</a> is licensed under <a href="https://creativecommons.org/licenses/by/4.0/?ref=chooser-v1" target="_blank" rel="license noopener noreferrer" style="display:inline-block;">Creative Commons Attribution 4.0 International</a></p>