Spaces:
Runtime error
Runtime error
bhavitvyamalik
commited on
Commit
β’
c62e9c5
1
Parent(s):
8678313
update sections
Browse files- sections/challenges.md +1 -3
- sections/intro.md +1 -1
- sections/social_impact.md +1 -1
- sections/usage.md +1 -1
sections/challenges.md
CHANGED
@@ -5,6 +5,4 @@ We faced challenges at every step of the way, despite having some example script
|
|
5 |
|
6 |
- The translations with deep learning models aren't as "perfect" as translation APIs like Google and Yandex. This could lead to poor performance.
|
7 |
|
8 |
-
- We prepared the model and config classes for our model from scratch, basing it on `CLIP Vision` and `
|
9 |
-
|
10 |
-
- We were only able to get around 1.5 days of training time on TPUs due to above mentioned challenges. We were unable to perform hyperparameter tuning. Our [loss curves on the pre-training model](https://huggingface.co/flax-community/spanish-image-captioning/tensorboard) show that the training hasn't converged, and we could see further improvement in the BLEU scores.
|
|
|
5 |
|
6 |
- The translations with deep learning models aren't as "perfect" as translation APIs like Google and Yandex. This could lead to poor performance.
|
7 |
|
8 |
+
- We prepared the model and config classes for our model from scratch, basing it on `CLIP Vision` and `Marian` implementations in Flax.
|
|
|
|
sections/intro.md
CHANGED
@@ -1,4 +1,4 @@
|
|
1 |
-
This demo uses [CLIP-Vision-Marian model checkpoint](https://huggingface.co/flax-community/
|
2 |
|
3 |
|
4 |
For more details, click on `Usage` or `Article` π€ below.
|
|
|
1 |
+
This demo uses [CLIP-Vision-Marian model checkpoint](https://huggingface.co/flax-community/clip-vit-base-patch32_marian-es) to predict caption for a given image in Spanish. Training was done using image encoder and text decoder with approximately 2.5 million image-text pairs taken from the [Conceptual 12M dataset](https://github.com/google-research-datasets/conceptual-12m) with captions translated using [MarianMT English to Spanish](https://huggingface.co/transformers/model_doc/marian.html).
|
2 |
|
3 |
|
4 |
For more details, click on `Usage` or `Article` π€ below.
|
sections/social_impact.md
CHANGED
@@ -1,4 +1,4 @@
|
|
1 |
## Social Impact
|
2 |
Being able to automatically describe the content of an image using properly formed sentences in any language is a challenging task, but it could have great impact by helping visually impaired people better understand their surroundings.
|
3 |
|
4 |
-
Our initial plan was to work with a low-resource language
|
|
|
1 |
## Social Impact
|
2 |
Being able to automatically describe the content of an image using properly formed sentences in any language is a challenging task, but it could have great impact by helping visually impaired people better understand their surroundings.
|
3 |
|
4 |
+
Our initial plan was to work with a low-resource language only. However, the existing translations do not perform as well and we would have received poor labels and hence we did not pursue this further.
|
sections/usage.md
CHANGED
@@ -1,4 +1,4 @@
|
|
1 |
-
- This demo loads the `FlaxCLIPVisionMarianMT` present in the `model` directory of this repository. The checkpoint is loaded from `ckpt/ckpt-23999` which is pre-trained checkpoint with
|
2 |
|
3 |
- We provide `English Translation` of the generated caption and reference captions for users who are not well-acquainted with Spanish. This is done using `mtranslate` to keep things flexible enough and needs internet connection as it uses the Google Translate API. We will also add the original captions soon.
|
4 |
|
|
|
1 |
+
- This demo loads the `FlaxCLIPVisionMarianMT` present in the `model` directory of this repository. The checkpoint is loaded from `ckpt/ckpt-23999` which is pre-trained checkpoint with 24k steps. 100 random validation set examples are present in the `references.tsv` with respective images in the `images` directory.
|
2 |
|
3 |
- We provide `English Translation` of the generated caption and reference captions for users who are not well-acquainted with Spanish. This is done using `mtranslate` to keep things flexible enough and needs internet connection as it uses the Google Translate API. We will also add the original captions soon.
|
4 |
|