Spaces:

altndrr
/

vic

Running

altndrr commited on Jun 5, 2023

Commit

a0fbf80

1 Parent(s): 26b1205

Move alpha definition in slider

Files changed (1) hide show

app.py CHANGED Viewed

@@ -32,8 +32,7 @@ unconstrained semantic space by multimodal data from large vision-language datab
 retrieve the semantically most similar captions from a database, from which we extract a set of
 candidate categories by applying text parsing and filtering techniques. We further score the
 candidates using the multimodal aligned representation of the large pre-trained VLM, *i.e.* CLIP,
-to obtain the best-matching category, using *alpha* as a hyperparameter to control the trade-off
-between the visual and textual similarity.
 """
 PAPER_URL = "https://arxiv.org/abs/2306.00917"
@@ -67,7 +66,13 @@ demo = gr.Interface(
     fn=vic,
     inputs=[
         gr.Image(type="filepath", label="input"),
-        gr.Slider(0.0, 1.0, value=0.5, label="alpha"),
     ],
     outputs=[gr.Label(num_top_classes=5, label="output")],
     title=PAPER_TITLE,

 retrieve the semantically most similar captions from a database, from which we extract a set of
 candidate categories by applying text parsing and filtering techniques. We further score the
 candidates using the multimodal aligned representation of the large pre-trained VLM, *i.e.* CLIP,
+to obtain the best-matching category.
 """
 PAPER_URL = "https://arxiv.org/abs/2306.00917"
     fn=vic,
     inputs=[
         gr.Image(type="filepath", label="input"),
+        gr.Slider(
+            0.0,
+            1.0,
+            value=0.5,
+            label="alpha",
+            info="trade-off between the text (left) and image (right) modality",
+        ),
     ],
     outputs=[gr.Label(num_top_classes=5, label="output")],
     title=PAPER_TITLE,