Update README.md
Browse files
README.md
CHANGED
@@ -1,80 +1,80 @@
|
|
1 |
-
---
|
2 |
-
license: cc-by-4.0
|
3 |
-
language:
|
4 |
-
- en
|
5 |
-
- tr
|
6 |
-
tags:
|
7 |
-
- VLM
|
8 |
-
- image2text
|
9 |
-
- lm
|
10 |
-
---
|
11 |
-
# TeLVE: Turkish efficient Language Vision Engine 🧿
|
12 |
-
[![License: CC BY 4.0](https://img.shields.io/badge/License-CC%20BY%204.0-lightgrey.svg)](https://creativecommons.org/licenses/by/4.0/)
|
13 |
-
[![Models: v1.0](https://img.shields.io/badge/Models-v1.0%2c%20v1.0dep-blue)](https://huggingface.co/outsu/TeLVE)
|
14 |
-
## First Turkish VLM ever!
|
15 |
-
|
16 |
-
TeLVE is the first Visual Language Model specifically designed for Turkish language understanding and image description generation. Built on Vision Transformer (ViT) and BERT pre-trained encoder architectures, it bridges the gap in Turkish visual-linguistic processing.
|
17 |
-
No module named 'imagine'
|
18 |
-
![TeLVE logo](<teLVE_logo.png>)
|
19 |
-
|
20 |
-
## Model Description
|
21 |
-
|
22 |
-
TeLVE combines:
|
23 |
-
- 🖼️ Vision Transformer (ViT-base-patch16-224)
|
24 |
-
- 📝 Turkish BERT (dbmdz/bert-base-turkish-cased)
|
25 |
-
- 🔄 Cross-attention mechanism for vision-language fusion
|
26 |
-
|
27 |
-
### Version Logs
|
28 |
-
- **TeLVE v1.0**: Trained on Unsplash Lite dataset
|
29 |
-
- **TeLVE v1.0dep**: Dataset enhanced with selective images from Pexels images, the encoder problem with letter "ü" was fixed. *(Deprecated, performance was decreased because of dataset addressing problem. Not recommended to use.)*
|
30 |
-
|
31 |
-
## Usage
|
32 |
-
|
33 |
-
The model can be used in two ways:
|
34 |
-
|
35 |
-
### Inference (imagine.py)
|
36 |
-
```python
|
37 |
-
# Generate captions for images
|
38 |
-
python imagine.py
|
39 |
-
```
|
40 |
-
This script:
|
41 |
-
- Loads a trained TeLVE model
|
42 |
-
- Takes images from `images` directory
|
43 |
-
- Generates Turkish captions for each image
|
44 |
-
- Outputs the results to console
|
45 |
-
|
46 |
-
### Training (main.py)
|
47 |
-
Users can train their own models with ViT and BERT encoders.
|
48 |
-
```python
|
49 |
-
# Train a new model
|
50 |
-
python main.py
|
51 |
-
```
|
52 |
-
|
53 |
-
This script:
|
54 |
-
- Loads and preprocesses image-caption pairs
|
55 |
-
- Initializes ViT and BERT encoders
|
56 |
-
- Trains the combined model
|
57 |
-
- Saves the model and tokenizer
|
58 |
-
|
59 |
-
|
60 |
-
## Performance
|
61 |
-
Performance scores will be evaluated.
|
62 |
-
<!--
|
63 |
-
| Model Version | Dataset | BLEU-4 | METEOR | CIDEr |
|
64 |
-
|--------------|---------|---------|---------|--------|
|
65 |
-
| TeLVE v1.0 | Unsplash | *TBD* | *TBD* | *TBD* |
|
66 |
-
| TeLVE v1.1 | Unsplash+Pexels | *TBD* | *TBD* | *TBD* |-->
|
67 |
-
|
68 |
-
## Citation
|
69 |
-
|
70 |
-
```bibtex
|
71 |
-
@software{telve2024,
|
72 |
-
author = {Öğüt Su Karagün},
|
73 |
-
title = {TeLVE: Turkish efficient Language Vision Engine},
|
74 |
-
year = {2024},
|
75 |
-
url = {https://huggingface.co/outsu/TeLVE}
|
76 |
-
}
|
77 |
-
```
|
78 |
-
|
79 |
-
## License
|
80 |
-
|
|
|
1 |
+
---
|
2 |
+
license: cc-by-4.0
|
3 |
+
language:
|
4 |
+
- en
|
5 |
+
- tr
|
6 |
+
tags:
|
7 |
+
- VLM
|
8 |
+
- image2text
|
9 |
+
- lm
|
10 |
+
---
|
11 |
+
# TeLVE: Turkish efficient Language Vision Engine 🧿
|
12 |
+
[![License: CC BY 4.0](https://img.shields.io/badge/License-CC%20BY%204.0-lightgrey.svg)](https://creativecommons.org/licenses/by/4.0/)
|
13 |
+
[![Models: v1.0](https://img.shields.io/badge/Models-v1.0%2c%20v1.0dep-blue)](https://huggingface.co/outsu/TeLVE)
|
14 |
+
## First Turkish VLM ever!
|
15 |
+
|
16 |
+
TeLVE is the first Visual Language Model specifically designed for Turkish language understanding and image description generation. Built on Vision Transformer (ViT) and BERT pre-trained encoder architectures, it bridges the gap in Turkish visual-linguistic processing.
|
17 |
+
No module named 'imagine'
|
18 |
+
![TeLVE logo](<teLVE_logo.png>)
|
19 |
+
|
20 |
+
## Model Description
|
21 |
+
|
22 |
+
TeLVE combines:
|
23 |
+
- 🖼️ Vision Transformer (ViT-base-patch16-224)
|
24 |
+
- 📝 Turkish BERT (dbmdz/bert-base-turkish-cased)
|
25 |
+
- 🔄 Cross-attention mechanism for vision-language fusion
|
26 |
+
|
27 |
+
### Version Logs
|
28 |
+
- **TeLVE v1.0**: Trained on Unsplash Lite dataset
|
29 |
+
- **TeLVE v1.0dep**: Dataset enhanced with selective images from Pexels images, the encoder problem with letter "ü" was fixed. *(Deprecated, performance was decreased because of dataset addressing problem. Not recommended to use.)*
|
30 |
+
|
31 |
+
## Usage
|
32 |
+
|
33 |
+
The model can be used in two ways:
|
34 |
+
|
35 |
+
### Inference (imagine.py)
|
36 |
+
```python
|
37 |
+
# Generate captions for images
|
38 |
+
python imagine.py
|
39 |
+
```
|
40 |
+
This script:
|
41 |
+
- Loads a trained TeLVE model
|
42 |
+
- Takes images from `images` directory
|
43 |
+
- Generates Turkish captions for each image
|
44 |
+
- Outputs the results to console
|
45 |
+
|
46 |
+
### Training (main.py)
|
47 |
+
Users can train their own models with ViT and BERT encoders.
|
48 |
+
```python
|
49 |
+
# Train a new model
|
50 |
+
python main.py
|
51 |
+
```
|
52 |
+
|
53 |
+
This script:
|
54 |
+
- Loads and preprocesses image-caption pairs
|
55 |
+
- Initializes ViT and BERT encoders
|
56 |
+
- Trains the combined model
|
57 |
+
- Saves the model and tokenizer
|
58 |
+
|
59 |
+
|
60 |
+
## Performance
|
61 |
+
Performance scores will be evaluated.
|
62 |
+
<!--
|
63 |
+
| Model Version | Dataset | BLEU-4 | METEOR | CIDEr |
|
64 |
+
|--------------|---------|---------|---------|--------|
|
65 |
+
| TeLVE v1.0 | Unsplash | *TBD* | *TBD* | *TBD* |
|
66 |
+
| TeLVE v1.1 | Unsplash+Pexels | *TBD* | *TBD* | *TBD* |-->
|
67 |
+
|
68 |
+
## Citation
|
69 |
+
|
70 |
+
```bibtex
|
71 |
+
@software{telve2024,
|
72 |
+
author = {Öğüt Su Karagün},
|
73 |
+
title = {TeLVE: Turkish efficient Language Vision Engine},
|
74 |
+
year = {2024},
|
75 |
+
url = {https://huggingface.co/outsu/TeLVE}
|
76 |
+
}
|
77 |
+
```
|
78 |
+
|
79 |
+
## License
|
80 |
+
<p xmlns:cc="http://creativecommons.org/ns#" xmlns:dct="http://purl.org/dc/terms/"><a property="dct:title" rel="cc:attributionURL" href="https://huggingface.co/outsu/TeLVE">TeLVE</a> by <a rel="cc:attributionURL dct:creator" property="cc:attributionName" href="https://outsu.github.io">Öğüt Su Karagün</a> is licensed under <a href="https://creativecommons.org/licenses/by/4.0/?ref=chooser-v1" target="_blank" rel="license noopener noreferrer" style="display:inline-block;">Creative Commons Attribution 4.0 International</a></p>
|