distilvit / README.md
tarekziade's picture
Update README.md
3dccd4d verified
|
raw
history blame
1.08 kB
metadata
tags:
  - image-to-text
  - image-captioning
license: apache-2.0
widget:
  - src: >-
      https://huggingface.co/datasets/mishig/sample_images/resolve/main/savanna.jpg
    example_title: Savanna
  - src: >-
      https://huggingface.co/datasets/mishig/sample_images/resolve/main/football-match.jpg
    example_title: Football Match
  - src: >-
      https://huggingface.co/datasets/mishig/sample_images/resolve/main/airport.jpg
    example_title: Airport
base_model:
  - google/vit-base-patch16-224-in21k

This model is a work in progress.

You can find the code used to create the model here: https://github.com/mozilla/distilvit

Results after after 3 epochs (and ~45 hours of training)

  • eval_loss: 0.19939416646957397
  • eval_rouge1: 43.006
  • eval_rouge2: 16.9939
  • eval_rougeL: 38.8923
  • eval_rougeLsum: 38.8877
  • eval_gen_len: 11.327256736227712
  • eval_runtime: 1816.5255
  • eval_samples_per_second: 13.77
  • eval_steps_per_second': 1.721
  • train_runtime: 46263.3695
  • train_samples_per_second: 38.373
  • train_steps_per_second: 4.797
  • train_loss: 0.05974134062104816