|
---
|
|
license: cc-by-4.0
|
|
language:
|
|
- en
|
|
- tr
|
|
tags:
|
|
- VLM
|
|
- image2text
|
|
- lm
|
|
---
|
|
# TeLVE: Turkish efficient Language Vision Engine 🧿
|
|
[![License: CC BY 4.0](https://img.shields.io/badge/License-CC%20BY%204.0-lightgrey.svg)](https://creativecommons.org/licenses/by/4.0/)
|
|
[![Models: v1.0](https://img.shields.io/badge/Models-v1.0%2c%20v1.0dep-blue)](https://huggingface.co/outsu/TeLVE)
|
|
## First Turkish VLM ever!
|
|
|
|
TeLVE is the first Visual Language Model specifically designed for Turkish language understanding and image description generation. Built on Vision Transformer (ViT) and BERT pre-trained encoder architectures, it bridges the gap in Turkish visual-linguistic processing.
|
|
No module named 'imagine'
|
|
![TeLVE logo](<teLVE_logo.png>)
|
|
|
|
## Model Description
|
|
|
|
TeLVE combines:
|
|
- 🖼️ Vision Transformer (ViT-base-patch16-224)
|
|
- 📝 Turkish BERT (dbmdz/bert-base-turkish-cased)
|
|
- 🔄 Cross-attention mechanism for vision-language fusion
|
|
|
|
### Version Logs
|
|
- **TeLVE v1.0**: Trained on Unsplash Lite dataset
|
|
- **TeLVE v1.0dep**: Dataset enhanced with selective images from Pexels images, the encoder problem with letter "ü" was fixed. *(Deprecated, performance was decreased because of dataset addressing problem. Not recommended to use.)*
|
|
|
|
## Usage
|
|
|
|
The model can be used in two ways:
|
|
|
|
### Inference (imagine.py)
|
|
```python
|
|
# Generate captions for images
|
|
python imagine.py
|
|
```
|
|
This script:
|
|
- Loads a trained TeLVE model
|
|
- Takes images from `images` directory
|
|
- Generates Turkish captions for each image
|
|
- Outputs the results to console
|
|
|
|
### Training (main.py)
|
|
Users can train their own models with ViT and BERT encoders.
|
|
```python
|
|
# Train a new model
|
|
python main.py
|
|
```
|
|
|
|
This script:
|
|
- Loads and preprocesses image-caption pairs
|
|
- Initializes ViT and BERT encoders
|
|
- Trains the combined model
|
|
- Saves the model and tokenizer
|
|
|
|
|
|
## Performance
|
|
Performance scores will be evaluated.
|
|
<!--
|
|
| Model Version | Dataset | BLEU-4 | METEOR | CIDEr |
|
|
|--------------|---------|---------|---------|--------|
|
|
| TeLVE v1.0 | Unsplash | *TBD* | *TBD* | *TBD* |
|
|
| TeLVE v1.1 | Unsplash+Pexels | *TBD* | *TBD* | *TBD* |-->
|
|
|
|
## Citation
|
|
|
|
```bibtex
|
|
@software{telve2024,
|
|
author = {Öğüt Su Karagün},
|
|
title = {TeLVE: Turkish efficient Language Vision Engine},
|
|
year = {2024},
|
|
url = {https://huggingface.co/outsu/TeLVE}
|
|
}
|
|
```
|
|
|
|
## License
|
|
This work is licensed under a [Creative Commons Attribution 4.0 International License](http://creativecommons.org/licenses/by/4.0/).
|
|
|