TeLVE / README.md
outsu's picture
TeLVE v1.0dep released. Due to the addressing problem during training, it is not recommended to use it because it is trained with a dataset of about half the size.
206ebe3 verified
|
raw
history blame
2.56 kB
---
license: cc-by-4.0
language:
- en
- tr
tags:
- VLM
- image2text
- lm
---
# TeLVE: Turkish efficient Language Vision Engine 🧿
[![License: CC BY 4.0](https://img.shields.io/badge/License-CC%20BY%204.0-lightgrey.svg)](https://creativecommons.org/licenses/by/4.0/)
[![Models: v1.0](https://img.shields.io/badge/Models-v1.0%2c%20v1.0dep-blue)](https://huggingface.co/outsu/TeLVE)
## First Turkish VLM ever!
TeLVE is the first Visual Language Model specifically designed for Turkish language understanding and image description generation. Built on Vision Transformer (ViT) and BERT pre-trained encoder architectures, it bridges the gap in Turkish visual-linguistic processing.
No module named 'imagine'
![TeLVE logo](<teLVE_logo.png>)
## Model Description
TeLVE combines:
- 🖼️ Vision Transformer (ViT-base-patch16-224)
- 📝 Turkish BERT (dbmdz/bert-base-turkish-cased)
- 🔄 Cross-attention mechanism for vision-language fusion
### Version Logs
- **TeLVE v1.0**: Trained on Unsplash Lite dataset
- **TeLVE v1.0dep**: Dataset enhanced with selective images from Pexels images, the encoder problem with letter "ü" was fixed. *(Deprecated, performance was decreased because of dataset addressing problem. Not recommended to use.)*
## Usage
The model can be used in two ways:
### Inference (imagine.py)
```python
# Generate captions for images
python imagine.py
```
This script:
- Loads a trained TeLVE model
- Takes images from `images` directory
- Generates Turkish captions for each image
- Outputs the results to console
### Training (main.py)
Users can train their own models with ViT and BERT encoders.
```python
# Train a new model
python main.py
```
This script:
- Loads and preprocesses image-caption pairs
- Initializes ViT and BERT encoders
- Trains the combined model
- Saves the model and tokenizer
## Performance
Performance scores will be evaluated.
<!--
| Model Version | Dataset | BLEU-4 | METEOR | CIDEr |
|--------------|---------|---------|---------|--------|
| TeLVE v1.0 | Unsplash | *TBD* | *TBD* | *TBD* |
| TeLVE v1.1 | Unsplash+Pexels | *TBD* | *TBD* | *TBD* |-->
## Citation
```bibtex
@software{telve2024,
author = {Öğüt Su Karagün},
title = {TeLVE: Turkish efficient Language Vision Engine},
year = {2024},
url = {https://huggingface.co/outsu/TeLVE}
}
```
## License
This work is licensed under a [Creative Commons Attribution 4.0 International License](http://creativecommons.org/licenses/by/4.0/).