feat: add implementation details page
Browse files- pages/losses.py +39 -0
pages/losses.py
ADDED
@@ -0,0 +1,39 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import streamlit as st
|
2 |
+
from st_pages import add_indentation
|
3 |
+
|
4 |
+
add_indentation()
|
5 |
+
|
6 |
+
st.title('Loss functions')
|
7 |
+
st.subheader('SDM Loss')
|
8 |
+
st.markdown('''
|
9 |
+
The similarity distribution matching (SDM) loss, which is the KL divergence
|
10 |
+
of the image to text and text to image to the label distribution.
|
11 |
+
|
12 |
+
We define $f^v$ and $f^t$ to be the global representation of the visual and textual features respectively.
|
13 |
+
The cosine similarity $sim(u, v) = \\frac{u \\cdot v}{|u||v|}$ will be used to compute the probability of the labels.
|
14 |
+
|
15 |
+
We define $y_{i, j}=1$ if the visual feature $f^v_i$ matches the textual feature $f^t_j$, else $y_{i, j}=0$.
|
16 |
+
The predicted label distribution can be formulated by''')
|
17 |
+
st.latex(r'''
|
18 |
+
p_{i} = \sigma(sim(f^v_i, f^t))
|
19 |
+
''')
|
20 |
+
|
21 |
+
st.markdown('''
|
22 |
+
We can define the image to text loss as
|
23 |
+
''')
|
24 |
+
|
25 |
+
st.latex(r'''
|
26 |
+
\mathcal{L}_{i2t} = KL(\mathbf{p_i} || \mathbf{q_i})
|
27 |
+
''')
|
28 |
+
|
29 |
+
st.markdown('Where $\\mathbf{q_i}$, the true probability distribution, is defined as')
|
30 |
+
|
31 |
+
st.latex(r'''
|
32 |
+
q_{i, j} = \frac{y_{i, j}}{\sum_{k=1}^{N} y_{i, k}}
|
33 |
+
''')
|
34 |
+
|
35 |
+
st.markdown('It should be noted that the reason this computation is needed is because there could be multiple correct labels.')
|
36 |
+
|
37 |
+
|
38 |
+
st.subheader('IRR (MLM) Loss')
|
39 |
+
st.subheader('ID Loss')
|