outsu
/

TeLVE

English

Turkish

VLM

image2text

Model card Files Files and versions Community

outsu commited on 28 days ago

Commit

1f07473

verified ·

1 Parent(s): 206ebe3

Update README.md

Browse files

Files changed (1) hide show

README.md +80 -80

README.md CHANGED Viewed

@@ -1,80 +1,80 @@
----
-license: cc-by-4.0
-language:
-- en
-- tr
-tags:
-- VLM
-- image2text
-- lm
----
-# TeLVE: Turkish efficient Language Vision Engine 🧿
-[![License: CC BY 4.0](https://img.shields.io/badge/License-CC%20BY%204.0-lightgrey.svg)](https://creativecommons.org/licenses/by/4.0/)
-[![Models: v1.0](https://img.shields.io/badge/Models-v1.0%2c%20v1.0dep-blue)](https://huggingface.co/outsu/TeLVE)
-## First Turkish VLM ever!
-TeLVE is the first Visual Language Model specifically designed for Turkish language understanding and image description generation. Built on Vision Transformer (ViT) and BERT pre-trained encoder architectures, it bridges the gap in Turkish visual-linguistic processing.
- No module named 'imagine'
-![TeLVE logo](<teLVE_logo.png>)
-## Model Description
-TeLVE combines:
-- 🖼️ Vision Transformer (ViT-base-patch16-224)
-- 📝 Turkish BERT (dbmdz/bert-base-turkish-cased)
-- 🔄 Cross-attention mechanism for vision-language fusion
-### Version Logs
-- **TeLVE v1.0**: Trained on Unsplash Lite dataset
-- **TeLVE v1.0dep**: Dataset enhanced with selective images from Pexels images, the encoder problem with letter "ü" was fixed. *(Deprecated, performance was decreased because of dataset addressing problem. Not recommended to use.)*
-## Usage
-The model can be used in two ways:
-### Inference (imagine.py)
-```python
-# Generate captions for images
-python imagine.py
-```
-This script:
-- Loads a trained TeLVE model
-- Takes images from `images` directory
-- Generates Turkish captions for each image
-- Outputs the results to console
-### Training (main.py)
-Users can train their own models with ViT and BERT encoders.
-```python
-# Train a new model
-python main.py
-```
-This script:
-- Loads and preprocesses image-caption pairs
-- Initializes ViT and BERT encoders
-- Trains the combined model
-- Saves the model and tokenizer
-## Performance
-Performance scores will be evaluated.
-<!--
-| Model Version | Dataset | BLEU-4 | METEOR | CIDEr |
-|--------------|---------|---------|---------|--------|
-| TeLVE v1.0   | Unsplash | *TBD*   | *TBD*   | *TBD*  |
-| TeLVE v1.1   | Unsplash+Pexels | *TBD* | *TBD* | *TBD* |-->
-## Citation
-```bibtex
-@software{telve2024,
-    author = {Öğüt Su Karagün},
-    title = {TeLVE: Turkish efficient Language Vision Engine},
-    year = {2024},
-    url = {https://huggingface.co/outsu/TeLVE}
-}
-```
-## License
-This work is licensed under a [Creative Commons Attribution 4.0 International License](http://creativecommons.org/licenses/by/4.0/).

+---
+license: cc-by-4.0
+language:
+- en
+- tr
+tags:
+- VLM
+- image2text
+- lm
+---
+# TeLVE: Turkish efficient Language Vision Engine 🧿
+[![License: CC BY 4.0](https://img.shields.io/badge/License-CC%20BY%204.0-lightgrey.svg)](https://creativecommons.org/licenses/by/4.0/)
+[![Models: v1.0](https://img.shields.io/badge/Models-v1.0%2c%20v1.0dep-blue)](https://huggingface.co/outsu/TeLVE)
+## First Turkish VLM ever!
+TeLVE is the first Visual Language Model specifically designed for Turkish language understanding and image description generation. Built on Vision Transformer (ViT) and BERT pre-trained encoder architectures, it bridges the gap in Turkish visual-linguistic processing.
+ No module named 'imagine'
+![TeLVE logo](<teLVE_logo.png>)
+## Model Description
+TeLVE combines:
+- 🖼️ Vision Transformer (ViT-base-patch16-224)
+- 📝 Turkish BERT (dbmdz/bert-base-turkish-cased)
+- 🔄 Cross-attention mechanism for vision-language fusion
+### Version Logs
+- **TeLVE v1.0**: Trained on Unsplash Lite dataset
+- **TeLVE v1.0dep**: Dataset enhanced with selective images from Pexels images, the encoder problem with letter "ü" was fixed. *(Deprecated, performance was decreased because of dataset addressing problem. Not recommended to use.)*
+## Usage
+The model can be used in two ways:
+### Inference (imagine.py)
+```python
+# Generate captions for images
+python imagine.py
+```
+This script:
+- Loads a trained TeLVE model
+- Takes images from `images` directory
+- Generates Turkish captions for each image
+- Outputs the results to console
+### Training (main.py)
+Users can train their own models with ViT and BERT encoders.
+```python
+# Train a new model
+python main.py
+```
+This script:
+- Loads and preprocesses image-caption pairs
+- Initializes ViT and BERT encoders
+- Trains the combined model
+- Saves the model and tokenizer
+## Performance
+Performance scores will be evaluated.
+<!--
+| Model Version | Dataset | BLEU-4 | METEOR | CIDEr |
+|--------------|---------|---------|---------|--------|
+| TeLVE v1.0   | Unsplash | *TBD*   | *TBD*   | *TBD*  |
+| TeLVE v1.1   | Unsplash+Pexels | *TBD* | *TBD* | *TBD* |-->
+## Citation
+```bibtex
+@software{telve2024,
+    author = {Öğüt Su Karagün},
+    title = {TeLVE: Turkish efficient Language Vision Engine},
+    year = {2024},
+    url = {https://huggingface.co/outsu/TeLVE}
+}
+```
+## License
+<p xmlns:cc="http://creativecommons.org/ns#" xmlns:dct="http://purl.org/dc/terms/"><a property="dct:title" rel="cc:attributionURL" href="https://huggingface.co/outsu/TeLVE">TeLVE</a> by <a rel="cc:attributionURL dct:creator" property="cc:attributionName" href="https://outsu.github.io">Öğüt Su Karagün</a> is licensed under <a href="https://creativecommons.org/licenses/by/4.0/?ref=chooser-v1" target="_blank" rel="license noopener noreferrer" style="display:inline-block;">Creative Commons Attribution 4.0 International</a></p>