Spaces:

egmaminta
/

indoor-scene-recognition-to-speech

Runtime error

App Files Files Community

egmaminta commited on Mar 23, 2022

Commit

c2b3dec

1 Parent(s): fdbe9ff

Update app.py

Browse files

Files changed (1) hide show

app.py +2 -2

app.py CHANGED Viewed

@@ -46,14 +46,14 @@ gradio.Interface(fn=classify,
                                               type='auto'),
                  theme='grass',
                  examples=[['bedroom.jpg'],
-                           ['bathroom_AS.jpg'],
                            ['samsung_room.jpg']],
                  live=True,
                  layout='horizontal',
                  title='Indoor Scene Recognition',
                  description='A smart and easy-to-use indoor scene classifier. Start by uploading an input image. The outputs are the top five indoor scene classes that best fit your input image.',
                  interpretation='default',
-                 article='''<h2>Additional Information</h2><p style='text-align: justify'>This indoor scene classifier employs the <b><a href='https://huggingface.co/google/vit-base-patch16-224-in21k' target='_blank'>google/vit-base-patch16-224-in21k</a></b>, a <b>Visual Transformer (ViT)</b> model pre-trained on <b>ImageNet-21k</b> (14 million images, 21,843 classes) at resolution 224x224 and was first introduced in the paper <b><a href='https://arxiv.org/abs/2010.11929' target='_blank'>An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale</a></b> by Dosovitskiy et al. It was then fine-tuned on the <b><a href='https://www.kaggle.com/itsahmad/indoor-scenes-cvpr-2019' target='_blank'>MIT Indoor Scenes</a></b> data set from Kaggle. The source model (<b>vincentclaes/mit-indoor-scenes</b>) from Hugging Face is found in <b><a href='https://huggingface.co/vincentclaes/mit-indoor-scenes' target='_blank'>this link</a></b>.</p>
                  <p style='text-align: justify'>For further research on the Visual Transformer, the original GitHub repository is found in <b><a href='https://github.com/google-research/vision_transformer' target='_blank'>this link</a></b>.</p>
                  <h2>Disclaimer</h2>
                  <p style='text-align: justify'>The team releasing the Visual Transformer did not write a model card for it via Hugging Face. Hence, the Visual Transformer model card released in the Hugging Face Models library has been written by the Hugging Face team.</p>''',

                                               type='auto'),
                  theme='grass',
                  examples=[['bedroom.jpg'],
+                           ['cafe_shop.jpg'],
                            ['samsung_room.jpg']],
                  live=True,
                  layout='horizontal',
                  title='Indoor Scene Recognition',
                  description='A smart and easy-to-use indoor scene classifier. Start by uploading an input image. The outputs are the top five indoor scene classes that best fit your input image.',
                  interpretation='default',
+                 article='''<h2>Additional Information</h2><p style='text-align: justify'>This indoor scene classifier employs the <b><a href='https://huggingface.co/google/vit-base-patch16-224-in21k' target='_blank'>google/vit-base-patch16-224-in21k</a></b>, a <b>Visual Transformer (ViT)</b> model pre-trained on <b>ImageNet-21k</b> (14 million images, 21,843 classes) at resolution 224x224 and was first introduced in the paper <b><a href='https://arxiv.org/abs/2010.11929' target='_blank'>An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale</a></b> by Dosovitskiy et al. It was then fine-tuned on the <b><a href='https://www.kaggle.com/itsahmad/indoor-scenes-cvpr-2019' target='_blank'>MIT Indoor Scenes</a></b> data set from Kaggle. The source model is from <b><a href='https://huggingface.co/vincentclaes/mit-indoor-scenes' target='_blank'>vincentclaes/mit-indoor-scenes</a></b>.</p>
                  <p style='text-align: justify'>For further research on the Visual Transformer, the original GitHub repository is found in <b><a href='https://github.com/google-research/vision_transformer' target='_blank'>this link</a></b>.</p>
                  <h2>Disclaimer</h2>
                  <p style='text-align: justify'>The team releasing the Visual Transformer did not write a model card for it via Hugging Face. Hence, the Visual Transformer model card released in the Hugging Face Models library has been written by the Hugging Face team.</p>''',