|
--- |
|
language: |
|
- multilingual |
|
- en |
|
- sw |
|
- ha |
|
- yo |
|
- ig |
|
- zu |
|
- sn |
|
- ar |
|
- am |
|
- fr |
|
- pt |
|
tags: |
|
- zero-shot-image-classification |
|
- image generation |
|
- visual qa |
|
- text-image embedding |
|
- image-text embedding |
|
- pytorch |
|
- sartify |
|
- visual conversional ai |
|
- image semantic retrival |
|
- african raw resourced languages |
|
- safetensors |
|
- clip |
|
license: apache-2.0 |
|
library_name: transformers |
|
--- |
|
|
|
# AViLaMa : African Vision-Languages Aligment Pre-Training Model. |
|
Learning Visual Concepts Directly From African Languages Supervision. [Click to see paper](www.sartify.com) |
|
|
|
## Model Details |
|
AViLaMa is the large open-source text-vision alignment pre-training model in African languages. It brings a way to learn visual concepts directly from African languages supervision. Inspired from OpenAI CLIP, but with more modalities like video, audio, etc.. and other techniques like agnostic languages encoding, data filtering network. All for more than 12 African languages, trained on the #AViLaDa-2B datasets of filtered image, video, audio-text pairs. We are also working to make it usable in directly vision-vision tasks. |
|
|
|
- **Developed by :** Sartify LLC (www.sartify.com) |
|
- **Authors :** Innocent Charles, Zephania Reuben |
|
- **Funded by :** Sartify LLC,Open Source Community, etc..(We always welcome other donors) |
|
- **Model type :** multilingual & multimodality transformer |
|
- **Language(s) :** en (English), sw (Swahili), ha (Hausa), yo (Yoruba), ig (Igbo), zu (Zulu), sn (Shona), ar (Arabic), am (Amharic), fr (French), pt (Portuguese) |
|
- **License:** apache 2.0 |
|
|
|
## Load model from hugging face. |
|
```python |
|
import torch |
|
from transformers import AutoModel, AutoTokenizer |
|
|
|
model = AutoModel.from_pretrained("sartifyllc/AViLaMa") |
|
tokenizer = AutoTokenizer.from_pretrained("sartifyllc/AViLaMa") |
|
model = model.eval() |
|
``` |
|
## Model Sources |
|
- **Repository :** [AViLaMa-Sources](https://github.com/Sartify/AViLaMa-Sources) |
|
- **Datasets :** Coming... |
|
- **Paper :** Coming... |
|
- **Demo :** Coming... |
|
|
|
### Direct & Downstream Use: |
|
1. zero shot semantic image retrieval and ranking tasks. |
|
2. zero shot semantic audio retrieval and ranking tasks. |
|
3. zero shot semantic video retrieval and ranking tasks. |
|
4. zero shot image classification tasks. |
|
5. Zero shot video classification tasks. |
|
6. Zero shot audio classification tasks. |
|
7. visual QA tasks. |
|
8. visual conversional GenAI tasks. |
|
9. image and video capturing tasks. |
|
10. images and art generation guiding and conditioning tasks. |
|
11. text-images analysis tasks. |
|
12. content moderation task. |
|
|
|
# Citation |
|
|
|
**BibTeX:** |
|
```bibtex |
|
@inproceedings{sartifyllc2023, |
|
title={AViLaMa: Learning Visual Concepts Directly From African Languages Supervision}, |
|
author={Innocent Charles and |
|
Zephania Reuben}, |
|
booktitle={Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track}, |
|
year={2023}, |
|
url={https://sartify.com} |
|
} |
|
|
|
AViLaMa paper |
|
``` |
|
@article{sartifyllc2023africanvision, |
|
title={AViLaMa: Learning Visual Concepts Directly From African Languages Supervision}, |
|
author={Innocent Charles and |
|
Zephania Reuben}, |
|
journal={To be inserted}, |
|
year={2023} |
|
} |
|
``` |
|
|