TeLVE / README.md

TeLVE v1.0dep released. Due to the addressing problem during training, it is not recommended to use it because it is trained with a dataset of about half the size.

206ebe3 verified about 1 month ago

preview code

raw

history blame

2.56 kB

	---
	license: cc-by-4.0
	language:
	- en
	- tr
	tags:
	- VLM
	- image2text
	- lm
	---
	# TeLVE: Turkish efficient Language Vision Engine 🧿
	[![License: CC BY 4.0](https://img.shields.io/badge/License-CC%20BY%204.0-lightgrey.svg)](https://creativecommons.org/licenses/by/4.0/)
	[![Models: v1.0](https://img.shields.io/badge/Models-v1.0%2c%20v1.0dep-blue)](https://huggingface.co/outsu/TeLVE)
	## First Turkish VLM ever!

	TeLVE is the first Visual Language Model specifically designed for Turkish language understanding and image description generation. Built on Vision Transformer (ViT) and BERT pre-trained encoder architectures, it bridges the gap in Turkish visual-linguistic processing.
	No module named 'imagine'
	![TeLVE logo](<teLVE_logo.png>)

	## Model Description

	TeLVE combines:
	- 🖼️ Vision Transformer (ViT-base-patch16-224)
	- 📝 Turkish BERT (dbmdz/bert-base-turkish-cased)
	- 🔄 Cross-attention mechanism for vision-language fusion

	### Version Logs
	- TeLVE v1.0: Trained on Unsplash Lite dataset
	- TeLVE v1.0dep: Dataset enhanced with selective images from Pexels images, the encoder problem with letter "ü" was fixed. (Deprecated, performance was decreased because of dataset addressing problem. Not recommended to use.)

	## Usage

	The model can be used in two ways:

	### Inference (imagine.py)
	```python
	# Generate captions for images
	python imagine.py
	```
	This script:
	- Loads a trained TeLVE model
	- Takes images from `images` directory
	- Generates Turkish captions for each image
	- Outputs the results to console

	### Training (main.py)
	Users can train their own models with ViT and BERT encoders.
	```python
	# Train a new model
	python main.py
	```

	This script:
	- Loads and preprocesses image-caption pairs
	- Initializes ViT and BERT encoders
	- Trains the combined model
	- Saves the model and tokenizer


	## Performance
	Performance scores will be evaluated.
	<!--
	\| Model Version \| Dataset \| BLEU-4 \| METEOR \| CIDEr \|
	\|--------------\|---------\|---------\|---------\|--------\|
	\| TeLVE v1.0 \| Unsplash \| TBD \| TBD \| TBD \|
	\| TeLVE v1.1 \| Unsplash+Pexels \| TBD \| TBD \| TBD \|-->

	## Citation

	```bibtex
	@software{telve2024,
	author = {Öğüt Su Karagün},
	title = {TeLVE: Turkish efficient Language Vision Engine},
	year = {2024},
	url = {https://huggingface.co/outsu/TeLVE}
	}
	```

	## License
	This work is licensed under a [Creative Commons Attribution 4.0 International License](http://creativecommons.org/licenses/by/4.0/).