TeLVE / README.md
outsu's picture
TeLVE v1.0dep released. Due to the addressing problem during training, it is not recommended to use it because it is trained with a dataset of about half the size.
206ebe3 verified
|
raw
history blame
2.56 kB
metadata
license: cc-by-4.0
language:
  - en
  - tr
tags:
  - VLM
  - image2text
  - lm

TeLVE: Turkish efficient Language Vision Engine 🧿

License: CC BY 4.0 Models: v1.0

First Turkish VLM ever!

TeLVE is the first Visual Language Model specifically designed for Turkish language understanding and image description generation. Built on Vision Transformer (ViT) and BERT pre-trained encoder architectures, it bridges the gap in Turkish visual-linguistic processing. No module named 'imagine' TeLVE logo

Model Description

TeLVE combines:

  • 🖼️ Vision Transformer (ViT-base-patch16-224)
  • 📝 Turkish BERT (dbmdz/bert-base-turkish-cased)
  • 🔄 Cross-attention mechanism for vision-language fusion

Version Logs

  • TeLVE v1.0: Trained on Unsplash Lite dataset
  • TeLVE v1.0dep: Dataset enhanced with selective images from Pexels images, the encoder problem with letter "ü" was fixed. (Deprecated, performance was decreased because of dataset addressing problem. Not recommended to use.)

Usage

The model can be used in two ways:

Inference (imagine.py)

# Generate captions for images
python imagine.py

This script:

  • Loads a trained TeLVE model
  • Takes images from images directory
  • Generates Turkish captions for each image
  • Outputs the results to console

Training (main.py)

Users can train their own models with ViT and BERT encoders.

# Train a new model
python main.py

This script:

  • Loads and preprocesses image-caption pairs
  • Initializes ViT and BERT encoders
  • Trains the combined model
  • Saves the model and tokenizer

Performance

Performance scores will be evaluated.

Citation

@software{telve2024,
    author = {Öğüt Su Karagün},
    title = {TeLVE: Turkish efficient Language Vision Engine},
    year = {2024},
    url = {https://huggingface.co/outsu/TeLVE}
}

License

This work is licensed under a Creative Commons Attribution 4.0 International License.